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A  central  goal  of  mathematical  linguistics  is  to  precisely  determine  the  power  of  a 
linguistic  theory.  Traditionally,  formal  language  theory  (the  Chomsky  hierarchy) 
and  its  generative  power  analyses  have  translated  this  question  into  the  narrower 
question  of  how  unrestricted  the  rult  format  of  a  theory  is.  Modern  computational 
complexity  theory  offers  another,  more  useful,  translation:  how  much  of  what  com¬ 
putational  resources  does  a  theory  consume?  Complexity  theory  also  offers  a  new 
perspective  on  descriptive  adequacy.  In  a  descriptively  adequate  linguistic  theory, 
the  structural  descriptions  and  computational  power  of  the  theory  match  those  of 
an  ideal  speaker-hearer. 

The  primary  goal  of  this  paper  is  to  demonstrate  how  considerations  from  com¬ 
putational  complexity  theory  can  inform  grammatical  theorizing.  To  this  end, 
the  paper  revises  generalized  phrase  structure  grammar  (GPSG)  linguistic  theory 
so  that  its  computational  power  more  closely  matches  the  limited  computational 
ability  of  an  ideal  speaker-hearer.  A  second  goal  is  to  provide  a  theoretical  frame¬ 
work  within  which  to  better  understand  the  wide  range  of  GPSG  models  that  have 
appeared  in  the  theoretical  and  computational  linguistics  literature,  embodied  in 
formal  definitions  as  well  as  in  implemented  computer  programs. 

The  paper  begins  with  an  outline  and  intuitive  complexity  analysis  of  the  GPSG 
formal  system  of  Gazdar,  Klein,  Pullum,  and  Sag  (1985).  Subsequently,  revisions 
to  the  formal  system  are  motivated  by  complexity  and  generative  concerns.  The 
revised  system  is  presented  along  with  an  account  of  topicalization,  expletive  pro¬ 
nouns,  and  parasitic  gaps.  This  work  falls  within  the  GPSG  approach  to  linguistics. 
Revised  GPSG  is,  however,  less  opaque,  more  tractable,  and  more  linguisticaDy  con¬ 
strained  than  standard  GPSG  theory:  GPSG  Recognition  is  EXP-POLY  time  hard, 
while  RGPSG  Recognition  is  NP-complete. 
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V  A  central  goal  of  mathematical  linguistics  is  to  precisely  determine  the  power  of  a 
linguistic  theory.  Traditionally,  formal  language  theory  (the  Chomsky  hierarchy) 
and  its  generative  power  analyses  have  translated  this  question  into  the  narrower 
question  of  how  unrestricted  the  rule  format  of  a  theory  is.  Modern  computational 
complexity  theory  offers  another,  more  useful,  translation:  how  much  of  what  com¬ 
putational  resources  does  a  theory  consume?  Complexity  theory  also  offers  a  new 
perspective  on  descriptive  adequacy.  In  a  descriptively  adequate  linguistic  theory, 
the  structural  descriptions  and  computational  power  of  the  theory  match  those  of 
an  ideal  speaker-hearer,  'j 

C  The  primary  goal  of  this  paper  is  to  demonstrate  how  considerations  from  com¬ 
putational  complexity  theory  can  inform  grammatical  theorizing.  To  this  end, 
the  paper  revises  generalized  phrase  structure  grammar  (GPSG)  linguistic  theory 
so  that  its  computational  power  more  closely  matches  the  limited  computational 
ability  of  an  ideal  speaker-hearer.  A  second  goal  is  to  provide  a  theoretical  frame¬ 
work  within  which  to  better  understand  the  wide  range  of  GPSG  models  that  have 
appeared  in  the  theoretical  and  computational  linguistics  literature,  embodied  in 
formal  definitions  as  well  as  in  implemented  computer  programs.  /  ^  r 

The  paper  begins  with  an  outline  and  intuitive  complexity  analysis  of  the  GPSG 
formal  system  of  Gazdar,  Klein,  Pullum,  and  Sag  (1985).  Subsequently,  revisions 
to  the  formal  system  are  motivated  by  complexity  and  generative  concerns.  The 
revised  system  is  presented  along  with  a  detailed  account  of  topicalization,  expletive 
pronouns,  and  parasitic  gaps.  An  extensive  RGPSG  for  English  is  included  in 
an  appendix.  This  work  falls  within  the  GPSG  approach  to  linguistics.  Revised 
GPSG  is,  however,  less  opaque,  more  tractable,  and  more  linguistically  constrained 
than  standard  GPSG  theory:  GPSG  Recognition  is  EXP-POLY  time  hard,  while 
RGPSG  Recognition  is  NP-complete. 
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1  Introduction  and  Motivation 


A  linguistic  theory  specifies  a  computational  process  that  assigns  structural 
descriptions  to  utterances.  This  process  requires  certain  computational  re¬ 
sources,  such  as  time  or  space.  In  a  descriptively  adequate  linguistic  the¬ 
ory,  the  computational  resources  used  by  the  theory  match  those  used  by 
an  ideal  speaker-hearer.  In  this  paper,  I  explain  exactly  how  computational 
complexity  analysis  can  be  used  to  revise  generalized  phrase  structure  gram¬ 
mar  (GPSG)  so  that  its  computational  power  more  closely  corresponds  to 
the  limited  ability  of  an  ideal  speaker-hearer.  This  work  falls  within  the 
GPSG  approach  to  linguistics,  as  presented  in  Gazdar,  Klein,  Pulium,  and 
Sag  (1985),  GKPS  hereafter.  Revised  GPSG  is,  however,  less  opaque,  more 
tractable,  and  more  linguistically  constrained  than  GPSG:  GPSG  Recogni¬ 
tion  is  EXP-POLY  time  hard,  while  R.GPSG  Recognition  is  NP-complete. 

Computational  complexity  theory  measures  the  intrinsic  lower-bound 
difficulty  of  obtaining  the  solution  to  a  problem  no  matter  how  the  solution 
is  obtained.  It  classifies  problems  according  to  the  amount  of  computa¬ 
tional  resources  (for  example,  time,  space,  electricity)  needed  to  solve  them 
on  some  abstract  machine  model,  typically  a  deterministic  Turing  machine. 
Complexity  classifications  are  invariant  across  a  wide  range  of  primitive 
machine  models,  all  choices  of  representation,  algorithm,  and  actual  imple¬ 
mentation,  find  even  the  resource  measure  itself.  For  linguists,  complexity 
analysis  provides  a  new  answer  to  the  central  question  of  formal  linguistics: 
how  powerful  is  a  linguistic  theory?  For  computational  linguists,  it  provides 
a  precise  implementation-independent  cost  measure  necessary  for  informed 
parser  engineering. 

The  bulk  of  this  paper  is  devoted  to  informally  identifying  what  com¬ 
putational  resources  are  used  by  GPSG  theory,  and  determining  whether 
they  are  linguistically  necessary.  GPSG  contains  five  formal  devices,  each 
of  which  is  used  to  model  some  linguistic  phenomenon  or  ability,  and  each 
of  which  requires  certain  computational  resources.  I  identify  those  aspects 
of  each  device  that  cause  intractability  and  then  restrict  the  computational 
power  of  each  device  to  more  closely  match  the  (inherent)  complexity  of  the 
phenomenon  or  ability  it  models.  This  method  reveals  the  tension  between 
descriptive  adequacy  and  explanatory  power  that,  when  precisely  focused  by 
complexity  analysis,  I  find  fascinating.  The  remainder  of  the  paper  presents 
the  new  formal  system  and  exercises  it  in  the  domain  of  topicalization,  ex- 
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pletive  pronouns,  and  parasitic  gaps.  The  conclusion  places  this  work  in 
perspective  in  mathematical  linguistics. 

In  my  opinion,  the  primary  value  of  this  work  lies  in  the  result  (revised 
GPSG,  or  RGPSG)  as  well  as  in  its  use  of  complexity  analysis  to  under¬ 
stand  and  improve  a  major  linguistic  theory.  RGPSG  is  of  value  both  to 
linguists  and  computational  linguists  because  it  is  more  tractable  and  easier 
to  understand,  use,  and  implement.  It  can  be  efficiently  implemented  and 
appears  to  have  better  empirical  coverage  that  its  GPSG  ancestor,  in  addi¬ 
tion  to  fixing  some  errors  in  GKPS.  It  would  be  informative  for  the  reader 
to  compare  this  work  to  other  “revised  GPSGs,”  such  as  the  head-driven 
phrase  structure  grammar  of  Pollard  (1984)  and  the  unification  categorial 
grammar  of  Zeevat,  Klein,  and  Calder  (1987). 
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2  The  GPSG  Formal  System 

A  GPSG  is  a  formal  model  of  linguistic  competence.  As  a  linguistic  model, 
it  must  encode  linguistically  significant  relations,  such  as  domination  and 
predication,  and  it  must  constrain  their  distribution.  And,  as  it  is  formal,  the 
model  must  be  perfectly  explicit.  This  section  outlines  the  GPSG  formal 
system,  as  presented  in  Gazdar,  Klein,  Pullum,  and  Sag  (1985),  GKPS 
hereafter,  and  explains  how  abstract  linguistic  relations  are  formally  encoded 
in  GPSG,  and  how  these  relations  are  formally  constrained. 

2.1  Overview  of  GPSG  Formalisms 

From  the  perspective  of  classic  formal  language  theory,  a  GPSG  may  be 
thought  of  as  a  grammar  for  generating  a  context-free  grammar.  The  genera¬ 
tion  process  begins  with  immediate  dominance  (ID)  rules,  which  are  context- 
free  productions  with  unordered  right-hand  sides.  An  important  feature  of 
ID  rules  is  that  nonterminals  in  the  rules  are  not  atomic  symbols  (for  ex¬ 
ample,  NP).  Rather,  GPSG  nonterminals  are  sets  of  [feature  feature-value] 
pairs.  For  example,  [H  +]  is  a  [feature  feature-value ]  pair,  and  the  set 
{  CN  +] ,  [V  -],  [BAR  2]}  is  the  GPSG  representation  of  a  norm  phrase.1 
Next,  metarules  apply  to  the  ID  rules,  resulting  in  an  enlarged  set  of  ID 
rules.  Metarules  have  fixed  input  and  output  patterns  containing  a  distin¬ 
guished  multiset  variable  W  in  addition  to  constants.  If  an  ID  rule  matches 
the  input  pattern  under  some  specialization  of  the  variable  W ,  then  the 
metarule  generates  an  ID  rule  corresponding  to  the  metarule’s  output  pat¬ 
tern  under  the  same  specialization  of  W .  For  example,  the  passive  metarule 

VP  W,  NP 

11  (1) 

VP  [PAS]  -4  W,(PP[by]) 

says  that  “for  every  ID  rule  in  the  grammar  which  permits  a  VP  to  dominate 
an  NP  and  some  other  material,  there  is  also  a  rule  in  the  grammar  which 
permits  the  passive  category  VP  [PAS]  to  dominate  just  the  other  material 
from  the  original  rule,  together  (optionally)  with  a  PP[by]”  (GKPS:59). 

1  Although  syntactic  categories  in  GPSG  are  not  atomic  symbols,  they  are  traditionally 
abbreviated  (up  to  ambiguity)  by  atomic  symbols  such  as  “ NP"  (as  explained  below), 
which  has  confused  some  readers. 
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Below  we  use  the  finite  closure  problem  to  determine  the  cost  of  applying 
metarules  in  this  manner.  Principles  of  universal  feature  instantiation  (UFI) 
apply  to  the  resulting  enlarged  set  of  ID  rules,  defining  a  still  larger  set  of 
phrase  structure  trees  of  depth  one  (local  trees).  One  principle  of  UFI  is 
the  head  feature  convention,  which  ensures  that  phrases  are  projected  from 
lexical  heads.  Informally,  the  head  feature  convention  is  GPSG’s  x-theory. 
We  will  use  the  category  projection  problem  to  determine,  in  part,  the  cost 
of  mapping  ID  rules  to  local  trees.  Finally,  linear  precedence  statements  are 
applied  to  the  instantiated  local  trees.  LP  statements  order  the  unordered 
daughters  in  the  instantiated  local  trees.  The  ultimate  result,  therefore, 
is  a  set  of  ordered  local  trees,  and  these  are  equivalent  to  the  context-free 
productions  in  a  context-free  grammar.  The  resulting  context-free  grammar 
derives  the  language  of  the  GPSG. 

From  the  perspective  of  a  linguist,  GPSG  theory  contains  five  language- 
particular  components:  immediate  dominance  rules,  metarules,  linear  prece¬ 
dence  constraints,  feature  co-occurrence  restrictions  (FCRs),  and  feature 
specification  defaults  (FSDs).2  GPSG  theory  also  provides  four  language- 
universal  components:  a  theory  of  syntactic  features,  principles  of  universal 
feature  instantiation,  principles  of  semantic  interpretation,  and  formal  re¬ 
lationships  among  various  components  of  the  grammar.  In  this  section,  I 
provide  a  brief  and  linguistically  motivated  overview  of  the  theory. 


2.2  Syntactic  categories 

In  current  GPSG  theory,  syntactic  categories  (nonterminals)  encode  ab¬ 
stract  linguistic  relations  and  properties  as  feature-value  pairs.  Categories 
encode  subcategorization,  agreement,  unbounded  dependency,  predication, 
and  other  syntactically  significant  relations.  If  a  relation  is  true  of  two 
categories  in  a  phrase  structure  tree,  then  the  relation  will  be  encoded  in 
every  category  on  the  unique  path  between  the  two  categories.  For  example, 
the  feature  SLASH  encodes  the  gap-filler  (unbounded  dependency)  relation. 
Therefore,  every  category  on  the  path  from  a  gap  to  its  filler  will  have  a 
SLASH  feature  whose  value  is  the  category  of  the  filler.  Similarly,  categories 
that  are  assigned  nominative  case  by  a  sister  category  in  a  local  tree  will 
have  a  CASE  feature  whose  value  is  SOM. 

’The  following  description  of  the  GPSG  formal  system  is  taken  with  substantive  mod¬ 
ifications  from  Barton,  Berwick,  and  Ristad  (1987). 
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More  formally,  GPSG  categories  are  partial  functions  that  map  features 
to  atomic  feature  values  or  to  syntactic  categories.  Categories  may  also 
be  thought  of  as  sets  of  feature  specifications.  A  feature  specification  is  a 
pair  [ feature  feature-value']  where  feature  is  an  atomic  symbol  and  feature- 
value  is  either  an  atomic  symbol  or  a  syntactic  category.  Thus,  [H  +]  indi¬ 
cates  that  the  atomic- valued  “nominal”  feature  has  the  “+”  value,  while 
[SLASH  { [BAR  2] }]  indicates  that  the  “slash”  feature  has  the  category 
{[BAR  2]}  as  its  value.  In  the  GPSG  system,  {[I  +],  [V  -],  [BAR  2]}  is 
an  noun  phrase,  {  [N  -],  [V  +],  [BAR  2],  [SUBJ  -]  }  is  a  verb  phrase,  and 
{[H  -],  [V  +],  [BAR  2],  [SUBJ  +]  }  is  a  verb  phrase  with  a  subject,  or  a 
clause.  Some  features  are  morphologically  realized;  for  example,  in  the 
GKPS  grammar  for  English,  a  category  bearing  the  feature  specification 
[PFORM  with]  is  a  prepositional  category  headed  by  the  preposition  with. 

I  adopt  the  abbreviatory  conventions  found  in  the  GPSG  literature:  syn¬ 
tactic  categories  may  be  abbreviated  up  to  ambiguity.  Thus,  a  noun  phrase 
containing  the  additional  feature  specifications  [CASE  NOM]  and  [POSS  +] 
might  written  AP[CASE  NOM.POSS  +]  or  even  as  ATDiOM,  +P0SS]  because 
the  atomic  feature- value  HOM  may  only  be  associated  with  the  CASE  feature. 
The  category-valued  SLASH  feature  is  abbreviated  with  a  trailing  slash  (‘/’) 
character:  VT[VF0RM  PAS, SLASH  NP]  is  usually  written  VP[PAS] //VP.  A 
numerical  value  appearing  inside  square  brackets  ([32],  for  example)  de¬ 
notes  a  SUBCAT  value,  while  a  numerical  value  that  precedes  a  set  of  square 
brackets  is  a  BAR  value.  For  example,  the  category  V0[ 2]  abbreviates  the 
category  {[I  -] , [V  ♦], [BAR  0], [SUBCAT  2]}. 

The  set  K  of  syntactic  categories  is  specified  inductively  by  listing  a  set 
Feat  of  features,  a  set  Atom  of  atomic-valued  features,  a  set  A  of  atomic  fea¬ 
ture  values,  a  function  p  that  defines  the  range  of  each  atomic-valued  feature, 
and  a  set  R  of  restrictive  predicates  on  categories  (feature  co-occurrence  re¬ 
strictions).  The  category-valued  features  in  (Feat  -  Atom)  allow  categories 
to  be  freely  contained  within  other  categories,  subject  to  FCRs  (below)  and 
the  restrictive  principle  of  finite  feature  closure ,  which  prevents  a  category- 
valued  feature  /  from  taking  categories  in  which  f  already  appears.  That 
is,  the  feature  specification  [/  C]  is  legal  only  “if  /  is  not  in  the  domain 
of  C,  or  in  the  domain  of  any  C'  contained  in  C,  at  any  level  of  embed¬ 
ding”  (GKPS:36). 

A  category  C\  extends  a  category  C2  (written  C\  □  C2)  if  and  only 
if  two  conditions  hold.  For  every  atomic  feature  specification  /  in  Cj, 
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it  must  be  true  that  C\( f)  =  Cj(/),  and  for  every  category-valued  fea¬ 
ture  specification  /  in  C2,  it  must  be  true  that  Ci(/)  □  Ci(f).  For  ex¬ 
ample,  { [H  +],[V  -] ,  [BAR  2] ,  [POSS  +] }  □  NP,  and  VP  2  S  because 
VP(  [SUBJ]  )  S(  [SUBJ]  ). 


2.3  Marking  Conventions 

As  one  might  imagine,  not  every  set  of  feature  specifications  that  satisfies 
finite  feature  closure  is  a  possible  syntactic  category.  For  example,  there 
are  no  passive  prepositional  phrases,  and  a  noun  phrase  cannot  bear  the 
[PFORM  with]  specification,  which  is  reserved  for  prepositional  categories. 
These  constraints  are  expressed  through  feature  co-occurrence  restrictions 
(FCRs)  and  feature  specification  defaults  (FSDs)f,  which  are  marking  con¬ 
ventions  used  in  the  GPSG  system  both  to  express  language-particular  facts 
and  to  restrict  the  overgeneration  of  other  formal  devices  (both  metarule  and 
feature  closure).  FCRs  and  FSDs  are  restrictive  predicates  on  categories, 
constructed  by  Boolean  combination  of  feature  specifications.  All  legal  cate¬ 
gories  must  unconditionally  satisfy  all  FCRs.  All  categories  must  also  satisfy 
all  FSDs,  if  it  is  possible  to  do  so  without  violating  an  FCR  or  a  principle 
of  universal  feature  instantiation.  For  example, 

FCR  1:  [IRV  D  fCAUZ  +]  A  [VFORM  FIR] ) 

requires  any  category  that  bears  the  [IHV  ♦]  feature  specification  to  also 
bear  the  specifications  [AUX  +]  and  [VFORM  FIR]. 


2.4  Immediate  Dominance/Linear  Precedence 

GPSG’s  immediate  dominance/linear  precedence  format  factors  out  two  in¬ 
dependent  relations  that  compose  phrase  structure.  An  ID  rule  is  a  context- 

free  production 

Co  -*  Ci,  C2,. . . ,  Ck 

whose  left-hand  side  (LHS)  is  the  mother  category  and  whose  right-hand 
side  (RHS)  is  an  unordered  multiset  of  daughter  categories,  some  of  which 
may  be  designated  as  head  daughters.  The  LHS  immediately  dominates  the 
unordered  RHS  in  a  tree  of  depth  one  (a  local  tree). 
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An  LP  statement  is  a  pair  of  category  predicates 


Pi<P2 

that  requires  a  category  C\  in  a  local  tree  to  precede  it’s  sister  Ci  if  C\  sat¬ 
isfies  Pi  and  Ci  satisfies  Pi-  A  predicate  is  a  Boolean  combination  (&,  V,  -1) 
of  truth- values  and  feature  specifications  such  that  if  a  category  C  bears  or 
extends  a  given  feature  specification,  that  feature  specification  is  true  of  C, 
else  false.  For  example,  the  LP  statement 

[SUBCAT]  -<  i  [SUBCAT] 

requires  categories  bearing  a  SUBCAT  specification  to  precede  categories  un¬ 
specified  for  SUBCAT  (that  is,  lexical  categories  must  precede  nonlexical  cat¬ 
egories).  This  LP  statement  requires  lexical  heads  to  precede  their  com¬ 
plements,  and  thereby  represents  the  setting  of  the  “head”  parameter  for 
English. 

The  primary  advantage  of  the  ID/LP  format  stems  from  its  partial  de¬ 
coupling  of  two  independent  linguistic  relations  (see  McCawley  1982  for 
arguments  that  these  relations  are  in  fact  independent):  by  decoupling  the 
two  relations,  GPSG  can  express  the  head  parameter  and  capture  some 
free-word  order  facts. 

2.5  Metarules 

Metarules  are  lexical  redundancy  rules.  Formally,  they  are  functions  that 
take  lexical  ID  rules — ID  rules  with  a  lexical  head — to  sets  of  lexical  ID  rules. 
Metarules  have  a  fixed  input  ID  rule  pattern  containing  a  mother  category, 
at  most  one  daughter  category,  and  a  distinguished  multiset  variable  W. 
W  ranges  over  multisets  of  daughter  categories.  If  an  ID  rule  matches  the 
input  pattern  under  some  extension  of  the  two  pattern  categories  find  some 
specialization  of  the  variable  W ,  then  the  metarule  generates  an  ID  rule 
corresponding  to  the  metarule’s  fixed  ID  rule  output  pattern  under  the  same 
extension  of  pattern  categories  find  same  specialization  of  W.  See  the  GKPS 
passive  metarule  above.  The  GKPS  grammar  for  English  includes  metarules 
for  subject-aux  inversion,  extraposition,  and  transitivity  alternations. 

The  complete  set  of  ID  rules  in  a  GPSG  is  the  maximal  set  that  can 
be  arrived  at  by  taking  each  metarule  and  applying  it  to  the  set  of  rules 
that  did  not  themselves  arise  from  the  application  of  that  metarule.  This 
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maximal  set  is  called  the  finite  closure  FC(M,R)  of  a  set  R  of  lexical  ID 
rules  under  a  set  M  of  metarules. 


2.6  Local  trees 


The  ID  rules  obtained  by  taking  the  finite  closure  of  the  metarules  on  the 
ID  rules  are  projected,  to  local  phrase  structure  trees.  Abstractly,  this  pro¬ 
cess  establishes  the  connection  between  those  relations  encoded  in  ID  rules 
(for  example,  domination,  subcategorization,  case,  modification,  and  predi¬ 
cation)  and  the  nonlocal  linguistic  relations  (for  example,  gap-filler,  agree¬ 
ment,  and  wh-element  scope).  Local  trees  are  projected  from  ID  rules  by 
mapping  the  categories  in  a  rule  into  legal  extensions  of  those  categories  in 
the  projected  local  tree. 


Co 


Ci,C 


2t  • 


,Ck 


projects  to  the  local  tree 


where  for  all  i  from  0  to  k,  C[  extends  C<.  Because  the  RHS  of  an  ID  rule  is 
unordered,  the  C[  could  appear  in  any  order  (subject  to  linear  precedence 
constraints). 

Principles  of  universal  feature  instantiation  (UFI)  constrain  this  projec¬ 
tion  by  requiring  categories  in  a  local  tree  to  agree  in  certain  feature  specifi¬ 
cations  when  it  is  possible  for  them  to  do  so.  For  example,  the  head  feature 
convention  (HFC)  requires  the  mother  to  agree  with  all  head  features  that 
the  head  daughters  agree  on,  if  agreement  is  possible.  The  HFC  expresses 
X-theory  in  part,  requiring  a  phrase  to  be  the  projection  of  its  head.  It  also 
plays  a  central  role  in  the  GPSG  account  of  coordination  phenomena,  re¬ 
quiring  the  conjuncts  in  a  coordinate  structure  to  all  participate  in  the  same 
linguistic  relations  with  the  rest  of  the  sentence.  The  two  other  principles 
of  UFI  are  the  control  agreement  principle  and  the  foot  feature  principle. 
The  control  agreement  principle  represents  the  GPSG  theory  of  predicate- 
argument  relations.  Informally,  it  requires  predicates  to  agree  with  their 
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arguments  (for  example,  verb  phrases  must  agree  with  their  subject  NPs  in 
English).  The  foot  feature  principle  provides  a  partial  account  of  gap-filler 
relations  in  the  GPSG  system,  including  parasitic  gaps  and  the  binding  facts 
of  reflexive  and  reciprocal  pronouns;  it  plays  a  role  strikingly  similar  to  that 
of  Pesetsky’s  (1982)  path  theory  and  Chomsky’s  (1986)  binding  and  chain 
theories.3  The  foot  feature  principle  requires  foot  features  instantiated  on 
the  mother  to  be  instantiated  on  at  least  one  of  the  daughters,  and  vice 
versa.  Thus  the  FFP  ensures  that  certain  syntactic  information  is  not  lost. 
“Exceptional”  feature  specifications  are  those  feature  specifications  in  an  ID 
rule  that  should  agree  by  virtue  of  a  principle  of  UFI,  but  are  unable  to 
without  changing  a  feature  specification  inherited  from  the  ID  rule. 

Local  trees  are  further  constrained  by  FSDs,  FCRs,  and  LP  statements. 
Finally,  local  trees  are  assembled  to  form  phrase  structure  trees,  which  are 
terminated  by  lexical  elements. 

Let  the  features  {IT,  V,  BAR}  and  {BEG,P0SS}  be  head  features  and  non- 
head  features,  respectively.  Let  the  symbol  H  mark  head  daughters.  Then 
the  ID  rule 

{[■  CV  ♦],  [BAR  2]}  -  H  [BAR  0]  ,  NP 
can  project  to  this  local  tree: 


H [N  -,V  +,BAR  0]  mposs 

In  this  example,  BAR  is  considered  an  exceptional  feature  specification  be¬ 
cause  the  mother’s  BAR  value  (2)  conflicts  with  the  head  daughter’s  BAR  value 
(0),  and  it  is  impossible  to  resolve  the  conflict  without  changing  an  existing 
feature  specification. 

’The  possibility  of  expressing  the  control  agreement  and  foot  feature  principles  as 
local  constraints  on  nonlocal  relations  falls  out  from  the  central  role  of  c-command,  or 
equivalently  unambiguous  paths,  in  binding  theory.  Similarly,  the  possibility  of  encoding 
multiple  gap- filler  relations  in  one  feature  specification  of  one  category,  as  in  the  GPSG 
analysis  of  parasitic  gaps,  corresponds  to  the  “no  crossing”  constraint  of  path  theory. 
Pesetsky  (1982:556)  compares  the  predictions  of  path  theory  and  principles  of  UFI  when 
the  two  diverge  in  cases  of  double  extraction  (for  example,  a  problem  that,  /  know  who , 
to  [a  talk  to  tj  about  e<])  from  coordinate  structures.  He  concludes  that  “the  apparent 
simplicity  of  the  slash  category  solution  fades  when  more  complex  cases  are  considered.” 


3  Classifications  of  Complexity  Theory 


This  section,  introduces  the  powerful  tool  of  modern  computational  complex¬ 
ity  analysis  in  order  to  apply  it  to  GPSG  theory  in  the  next  section.  Recall 
that  computational  complexity  theory  measures  the  intrinsic  lower-bound 
difficulty  of  obtaining  the  solution  to  a  problem  no  matter  how  the  solution 
is  obtained.  It  classifies  problems  according  to  the  amount  of  computational 
resources  (in  our  case,  time  and  space)  needed  to  solve  them  on  a  given 
abstract  machine  (for  example,  a  deterministic  Turing  machine). 

This  paper  refers  to  four  complexity  classes:  P ,  A /P,  PSPACE,  and  EXP- 
POLY.  Below  I  provide  an  intuitive  geometric  characterization  of  these  four 
classes  through  their  equivalence  class  representatives  (computation  trees). 
The  classes  are  defined  algebraically  as  follows. 


3.1  Four  Important  Complexity  Classes 

P  is  the  natural  and  important  class  of  problems  solvable  in  deterministic 
Poly  no  mi  ad  time,  that  is,  on  a  deterministic  Turing  machine  in  time  n-7  for 
some  integer  j,  where  n  denotes  the  size  of  the  problem  to  be  solved.4  P 
is  considered  to  be  the  class  of  problems  that  can  be  solved  efficiently.  For 
example,  sorting  takes  n  •  log  n  time  in  the  worst  case  using  a  variety  of 
algorithms,  and  therefore  is  efficiently  solvable. 

ASP  is  the  class  of  all  problems  solvable  in  A/’ondeterministic  Polynomial 
time.  Informally,  a  problem  is  in  ASP  if  one  can  guess  an  answer  to  the 
problem  and  then  verify  its  correctness,  all  in  polynomial  time.  For  example, 
the  problem  of  deciding  whether  a  whole  number  i  is  composite  is  in  ASP 
because  it  can  be  solved  by  quickly  guessing  a  pair  of  potential  divisors,  each 
less  than  (V*l»  and  then  quickly  checking  if  their  product  equals  i. 

PSPACE  is  the  class  of  problems  solvable  in  deterministic  polynomial 
space.  PSPACE  contains  ASP  because  polynomial  space  allows  us  to  simu¬ 
late  an  entire  A/P  computation,  but  it  is  not  known  if  the  inclusion  is  proper. 
Intuitively,  PSPACE  is  the  class  of  combinatorial  two-person  games:  it  in¬ 
cludes  the  problems  of  winning  generalized  versions  of  Checkers,  Go,  and 
Parker  Brothers’  Instant  Insanity*™).  Many  problems  in  formal  language 

*  Problems  must  be  encoded  in  a  “reasonable"  way  for  a  sise  measure  to  make  sense; 
for  discussion,  see  Garey  and  Johnson  (1979). 
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theory  are  known  to  be  PSPACE-complete,  such  as  context-sensitive  lan¬ 
guage  recognition  and  finite  state  automaton  inequivalence  and  intersection. 

Finally,  EXP-POLY  is  the  class  of  problems  solvable  in  deterministic 
time  0(c^n))  for  any  constant  c  and  polynomial  f(n)  in  n.  This  class 
includes  PSPACE,  and  all  exponential  time  problems,  and  so  includes  prob¬ 
lems  that  are  provably  intractable.  No  natural  problems  are  known  to 
be  EXP-POLY-complete,  although  the  universal  recognition  problem  for 
GPSGs  may  be. 

We  say  a  problem  T  is  C-hard  (with  respect  to  polynomial  time  reduc¬ 
tions)  if  T  is  at  least  as  hard  computationally  as  any  problem  in  the  com¬ 
plexity  class  C:  if  we  had  a  subroutine  that  solved  T  in  polynomial  time, 
then  we  could  write  a  program  to  solve  any  problem  in  C  in  polynomial  time 
on  a  deterministic  Turing  machine  (essentially  by  efficiently  transforming 
the  problem  in  C  to  T  and  then  solving  T  with  the  fast  subroutine.  Note 
that  T  need  not  be  in  C  to  be  C-hard.  A  problem  is  C -complete  if  it  is  both 
C-hard  and  included  in  C. 

NP-complete  problems  can  be  solved  only  by  methods  too  slow  for  even 
the  fastest  computers.  Since  it  is  widely  believed,  though  not  proved,  that 
no  faster  methods  of  solution  can  ever  be  found  for  these  problems,  NP- 
complete  problems  are  considered  the  easiest  hard  problems.5  However, 
some  NP-complete  problems  have  highly  efficient  near-optimal  solution  tech¬ 
niques,  and  some  have  good  average- time  behavior,  that  is,  the  instances 
that  occur  most  often  can  be  efficiently  solved.  Exponential  time-hard  prob¬ 
lems,  on  the  other  hand,  do  not  succumb  to  these  clever  methods.  As  an 
anonymous  reviewer  noted,  this  apparent  gap  between  theoretical  and  prac¬ 
tical  intractability  does  not  invalidate  complexity  analysis.  Rather,  it  makes 
complexity  analysis  all  the  more  valuable  as  a  necessary  first  step  on  the  path 
to  efficient  solution  techniques.  And,  as  is  the  case  here,  the  only  way  to 
eliminate  unnecessarily  powerful  aspects  of  a  formal  system  such  as  GPSG 
is  to  use  complexity  theory. 

Complexity  classifications  are  established  with  the  proof  technique  of  re¬ 
duction.  A  reduction  converts  instances  of  a  problem  T  of  known  complexity 
into  instances  of  a  problem  5  whose  complexity  we  wish  to  determine.  The 
reduction  operates  in  polynomial  time  (and  in  logarithmic  space  if  T  is  in 

‘For  additional  details,  the  reader  may  refer  to  Lewis  and  Papadimitriou  (1978);  Garey 
and  Johnson  (1979);  or  Barton,  Berwick,  and  Ristad  (1987).  This  last  work  concentrates 
on  the  relationship  between  computational  complexity  and  natural  language. 
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V).  Therefore,  if  we  had  a  polynomial  time  algorithm  for  solving  5,  then 
we  could  also  solve  T  in  polynomial  time,  simply  by  converting  instances 
of  T  into  5.  (This  follows  because  the  composition  of  two  polynomial  time 
functions  is  also  polynomial  time.)  For  instance,  if  we  choose  T  to  be  NP- 
complete,  then  the  polynomial  time  reduction  from  T  to  5  shows  that  5  is 
at  least  as  hard  as  T,  or  NP-hard.  If  we  were  also  to  prove  that  5  was  in 
ATP,  then  5  would  be  NP-complete. 

3.2  Four  Classes  of  Computation  Trees 

In  this  paper,  the  problems  of  known  complexity  are  based  on  a  class  of 
bounded  computation  trees.  A  computation  tree  is  a  possibly  infinite  tree 
of  OR-nod“s  and  AND-nodes,  each  of  which  contains  a  Turing  machine 
configuration.  That  is,  each  node  contains  a  state,  tape  contents,  and  a  head 
position.  A  configuration  C  immediately  dominates  its  successors — those 
configurations  reachable  in  one  machine  step  from  C.  Each  computation 
tree  completely  represents  the  actions  of  a  given  alternating  Turing  machine 
on  a  given  input,  as  explained  in  appendix  A.l. 

We  are  particularly  interested  in  the  four  classes  of  computation  trees 
that  correspond  to  the  complexity  classes  V,  AfP,  PSPACE,  and  EXP- 
POLY.  These  four  classes  of  computation  trees  are  defined  by  restrictions 
on  space  (that  is,  configuration  size),  tree  depth  (depth  is  a  proxy  for  time), 
and  the  type  of  branching  allowed  (see  table  1  and  figures  2-4).  By  provid¬ 
ing  this  conceptual  typology,  I  hope  to  provide  the  reader  with  an  intuitive, 
functional  understanding  of  complexity  classification. 


Computation  Tree  for  V.  The  computation  tree  for  V  is  simply  a 
straight  line  containing  a  polynomial  number  of  configurations  (see  figure  2). 
To  see  why,  consider  the  sequence  of  configurations  a  deterministic  Turing 
machine  moves  through  on  its  way  to  successfully  recognizing  an  input  string 
z  within  a  polynomial  time  bound  p(|zj).  The  machine  starts  in  some  ini¬ 
tial  configuration  Co  (read  head  at  some  starting  position,  blank  r/w  work 
tape,  input  z  written  on  r/o  input  tape,  and  finite-state  control  in  an  initial 
state).  Then,  because  it  is  deterministic,  it  moves  through  configurations 
Ci,  Cj,  and  so  forth  until  it  reaches  a  final  (accepting)  configuration.  We 
may  therefore  picture  the  configuration  sequence  as  a  straight  line.  The  ma¬ 
chine  recognizes  the  input  string  z  if  and  only  if  such  a  derivation  sequence 
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Complexity 

Computation  Tree  Restrictions 

Class 

Depth 

Space 

Branching 

P 

polynomial 

polynomial 

none 

A/P 

polynomial 

polynomial 

OR 

PSPACE 

polynomial 

polynomial 

AND/OR 

EXP-POLY 

unbounded 

polynomial 

AND/OR 

Figure  1:  The  complexity  classes  V ,  A IP,  PSPACE,  EXP-POLY  are  characterized 
by  computation  trees  with  restricted  depth,  space,  and  branching. 


exists. 

A  polynomial  time  DTM  computation  can  use  at  most  polynomial  space, 
and  therefore  the  configurations  themselves  may  require  polynomial  space 
to  represent  their  tape  contents. 


Computation  Tree  for  A/P .  The  computation  tree  for  A/P  is  an  OR- 
tree  of  polynomial  depth  (see  figure  3).  This  is  so  because  an  accepting 
computation  sequence  for  a  polynomial-time-bounded  nondeterministic  TM 
looks  like  an  OR-tree  of  polynomial  depth  that  is  rooted  at  the  initial  con¬ 
figuration  Cq.  At  any  step,  the  machine  can  take  one  of  a  finite  number  of 
nondeterministic  branches,  leading  to  new  next-state  configurations;  these 
configurations  in  turn  may  branch.  A  computation  succeeds  if  there  is  any 
path  from  root  Co  to  a  final  (accepting)  configuration  somewhere  on  the 
fringe  of  the  tree.  It  is  possible  that  some  of  these  paths  may  fail  or  never 
terminate,  but  for  the  machine  to  recognize  an  input,  only  one  sequence 
needs  to  reach  an  accepting  configuration  after  some  finite  number  of  steps. 
There  is  smother  way  of  saying  the  same  thing.  We  may  imagine  that  a  final 
configuration  labels  itself  true,  while  any  other  node  propagates  the  value 
true  upward  if  smy  daughter  has  it.  Then  the  computation  succeeds  if  the 
root  somehow  ever  becomes  labeled  true.  In  this  picture,  all  the  tree  nodes 
are  OR-nodes  because  a  node  gets  labeled  true  if  any  of  its  daughters  is 
labeled  true.  Again,  each  configuration  requires  no  more  than  polynomial 
space  to  write  down. 
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Figure  2:  The  computation  tree  for  V  is  simply  a  straight  line  containing  a  poly¬ 
nomial  number  p(n)  +  1  of  configurations. 


Computation  Trees  for  PSPACE  and  EXP-POLY.  The  computa¬ 
tion  tree  for  PSPACE  is  an  AND/OR  tree  of  polynomial  depth,  while  the 
computation  tree  for  EXP-POLY  is  an  AND/OR  tree  of  unbounded  depth 
and  polynomial  size.  (As  before,  an  OR-node  is  labeled  true  if  any  of  its 
daughters  are  labeled  true,  but  an  AND-node  is  labeled  true  only  if  all  of  its 
daughters  are  labeled  true.  Levels  of  AND-nodes  and  OR-nodes  alternate 
in  an  AND/OR  tree.) 

These  equivalences  are  due  to  a  famous  theorem  of  Chandra,  Kozen  and 
Stockmeyer  (1976)  which  relates  space  and  depth  in  AND/OR  computation 
trees  to  depth  and  space  in  nonbranching  computation  trees.  Their  theorem 
states  that  (1)  unbounded  depth  AND/OR  computation  trees  using  space 
j(n)  are  equivalent  to  nonbranching  computation  trees  of  depth  for 

some  constant  c,  and  (2)  depth  d(n)  AND/OR  computation  trees  are  equiv¬ 
alent  to  nonbranching  computation  trees  using  space  d(n).  An  earlier  result 
is  that  OR  computation  trees  are  equivalent  to  nonbranching  computation 
trees  when  both  are  allowed  unbounded  depth  and  polynomial  space. 

It  is  important  to  realize  that  restrictions  on  size,  depth,  breadth,  and 
branching  interact:  the  beauty  of  the  Chandra-Kozen-Stockmeyer  result 
is  that  it  relates  these  seemingly  independent  restrictions.  For  example,  a 
depth  d(n)  computation  tree  cannot  use  more  than  space  d(n)  because  a 
Turing  machine  cam  access  at  most  one  tape  square  per  move;  and  a  depth 
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Figure  3:  A  computation  tree  for  SfV  is  an  OR-tree  of  polynomial  depth.  The 
computation  succeeds  if  any  accepting  state  can  be  reached,  as  indicated  by  the 
OR-symbols  (V)  in  the  tree  nodes.  Here,  the  accepting  state  is  symbolized  by  a 
large  dot  and  the  accepting  path  is  marked  by  a  dark  line.  There  could  be  more 
than  one  accepting  state. 


d(n )  bounded  AND/OR  computation  tree  can  have  breadth  proportional  to 
cd(n)  for  some  constant  c. 

In  the  next  section,  we  define  four  problems  that  give  insight  into  the 
computational  structure  of  GPSG  theory.  To  determine  the  complexity  of 
these  problems,  we  will  try  to  find  a  structural  equivalence  between  one  of 
our  four  classes  of  computation  trees  send  the  GPSG  problem  5.  Without 
loss  of  generality,  we  restrict  our  attention  to  binary  branching  computation 
trees.  For  our  purposes,  a  structural  equivalence  is  an  efficient  algorithm  for 
converting  a  class  T  of  computation  trees  into  our  problem  S.  By  finding 
such  an  equivalence  between  a  class  T  of  computation  trees  and  our  problem, 
we  will  have  reduced  T  to  our  problem  5,  and  now  know  that  S  is  at  least 
as  hard  as  T.  Therefore,  the  more  general  and  unrestricted  we  can  make 
our  chosen  class  of  computation  trees,  the  harder  we  will  have  proved  our 
problem  5  to  be. 
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Figure  4:  PSPACE  and  EXP-POLY  computation  trees  are  polynomial  space 
bounded  AND/OR  trees.  PSPACE  computation  trees  have  polynomial  depth, 
while  EXP-POLY  trees  have  no  depth  bound.  The  subcalculation  at  an  OR-level 
(V)  succeeds  if  any  of  its  daughters  succeed  just  as  in  figure  3,  but  the  subcal¬ 
culation  at  an  AND-level  (A)  does  not  succeed  unless  all  of  its  daughters  do.  In 
this  tree,  accepting  states  are  symbolized  by  large  dots  and  the  essential  branches 
of  the  computation — making  up  a  pruned  computation  tree — are  marked  by  dark 
lines.  Note  that  the  rightmost  daughter  of  Co  is  an  AND-node,  which  requires 
every  daughter  to  succeed. 


4  Sources  of  Intractability  in  GPSG 


This  section  applies  our  newly  minted  reduction  technique  based  on  compu¬ 
tation  trees  to  the  GPSG  formal  system,  and  thereby  aspires  to  reveal  the 
essential  intuitive  character  of  GPSG’s  complexity. 

We  begin  by  examining  the  computational  complexity  of  two  components 
of  the  GPSG  formal  system  (metarules  and  the  feature  system)  and  show 
how  each  of  these  systems  can  lead  to  computational  intractability.  Then  we 
prove  that  the  universal  recognition  problem  for  GPSGs  is  EXP-POLY  hard, 
and  hence  assuredly  intractable.  In  another  words,  the  fastest  recognition 
algorithm  for  GPSGs  can  take  more  than  exponential  time. 

These  results  may  appear  surprising,  given  GPSG’s  weak  context-free 
generative  power.  The  goal  of  this  section  is  to  resolve  this  apparent  para¬ 
dox  and  answer  the  important  computational  and  linguistic  questions  raised 
by  the  proofs:  why  GPSG-Recognition  is  so  difficult,  what  aspects  of  the 
GPSG  formalisms  cause  intractability,  and  whether  they  are  linguistically 
necessary. 

4.1  Introduction  to  GPSG  Complexity 

The  GPSG’s  intractability  is  rooted  in  its  formed  attack  on  the  very  real 
problem  of  descriptive  adequacy.  In  GPSG,  formal  devices  overgenerate 
in  order  to  capture  the  vast  array  of  cross-linguistic  phenomenon,  and  then 
constrain  the  overgeneration  to  capture  exactly  the  phenomenon  of  a  chosen 
natural  language.  Thus,  one  might  almost  be  able  to  write  one  GPSG  that 
simultaneously  generated  all  natural  languages.  Furthermore,  the  compo¬ 
nents  of  GPSG  theory  can  interact  with  each  other  in  very  powerful  ways. 
For  example,  one  can  write  ID  rules  that  can  access  the  same  linguistic 
relations  that  UFI  does,  and  thereby  affect  the  operation  of  UFI. 

The  final  issue  I  touch  on  before  launching  into  a  complexity  analysis  of 
the  GPSG  formal  system  is  how  we  might  best  choose  problems  to  study 
the  GPSG  formal  system. 

The  power  of  a  linguistic  theory  must  be  known  precisely  in  order  to  meet 
the  competing  demands  of  descriptive  adequacy  and  explanatory  power,  and 
to  fully  understand  the  theory.  This  fundamental  question  in  mathematical 
linguistics  is  answered  by  measuring  the  power  of  the  grammars  licensed  by  a 
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linguistic  theory.  The  power  of  a  grammar  is  the  difficulty  of  characterizing 
its  output.  The  corresponding  formal  problem  is  the  recognition  problem 
(RP):  Is  a  given  string  in  a  given  language  or  not?  Alternately,  defining 
the  output  of  a  grammar  to  be  a  set  of  structural  descriptions  results  in 
the  parsing  problem :  what  structural  descriptions  are  assigned  by  a  given 
grammar  to  a  given  string? 

A  language  may  be  characterized  in  extension,  by  all  the  grammars  that 
generate  it,  or  constructively,  by  a  particular  grammar  that  generates  it.  In 
the  first  case,  the  fixed  language  RP  (FLRP)  is  posed:  Given  an  input  string 
z,  is  x  in  L  for  some  fixed  language  LI  It  does  not  matter  what  grammar 
generates  L :  both  grammar  and  langauge  are  fixed  (that  is,  ignored)  in  the 
problem  statement.  In  the  second  case,  the  grammar  is  of  interest,  and  the 
universal  RP  (URP)  is  posed:  Given  a  grammar  G  and  an  input  string  z, 
is  z  in  L{G)1  Because  the  URP  determines  membership  with  respect  to 
a  particular  grammar,  it  more  closely  models  the  parsing  problem,  which 
must  use  a  grammar  to  assign  structural  descriptions. 

A  centred  goal  of  this  work  is  to  expose  the  structure  of  the  computations 
specified  by  GPSG  models.  In  scientific  analysis,  we  strive  to  make  the 
assumptions  and  generalizations  that  give  the  best  insights,  and  hence  chose 
our  computational  problems  by  the  same  criterion.  The  FLRP  does  not  lead 
to  any  insights,  and  therefore  we  choose  to  study  the  gross  power  of  the 
GPSG  formal  system  using  the  URP.  Barton,  Berwick,  and  Ristad  (1987), 
hereafter  BBR,  defends  the  a  priori  desirability  of  this  choice. 

In  order  to  obtain  still  sharper  insights,  we  must  pose  computational 
problems  that  capture  the  detailed  internal  organization  of  the  GPSG  for¬ 
mal  system;  for  these  insights  to  be  relevant,  our  problems  must  be  problems 
that  GPSG  was  “designed”  to  solve.  Recall  that  the  process  of  assign¬ 
ing  structural  descriptions  to  utterances  consists  of  two  conceptual  steps  in 
GPSG:  the  projection  of  ID  rules  to  local  trees  and  the  derivation  of  ut¬ 
terances  from  nonterminals,  using  the  local  trees.  For  these  reasons,  our 
complexity  analysis  begins  by  analyzing  the  complexity  of  two  subproblems 
of  GPSG  projection  (category  projection  and  metarule  finite  closure)  and 
one  subproblem  of  derivation  (unordered  local  tree  recognition),  and  ends 
by  analyzing  the  URP  for  GPSGs. 

The  complexity  results  are  established  by  first  exhibiting  a  structural 
equivalence  between  a  computation  tree  and  a  GPSG  formal  object  (for 
example,  a  category  or  parse  tree)  and  then  showing  how  the  GPSG  object 
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can  be  specified  in  polynomial  time  by  the  reduction.  For  each  reduction,  I 
attempt  to  explain  the  complexity  results  in  terms  of  excesses  in  the  GPSG 
formalisms,  and  motivate  computational  restrictions  on  those  formal  devices. 
The  success  of  the  next  section’s  attempt  to  improve  GPSG’s  computational 
and  linguistic  properties  depends  crucially  on  this  section’s  success  in  tracing 
the  complexity  results  directly  back  to  fine  details  of  the  structure  of  the 
formal  system. 

Two  caveats  are  in  order.  First,  it  is  difficult  to  translate  a  precise  math¬ 
ematical  proof  into  a  more  easily  understood  conceptual  argument  without 
introducing  ambiguity  and  apparent  inconsistency.  Although  I  have  strived 
to  minimize  such  contagion,  it  surely  exists,  and  the  alert  disgruntled  reader 
is  urged  to  seek  out  the  formal  proofs,  which  may  be  found  in  the  appendices 
as  well  as  in  chapters  6-8  of  BBR. 

Second,  for  each  of  the  four  problems  discussed  below,  many  other  for¬ 
mal  restrictions  would  suffice  to  eliminate  the  intractability  they  pinpoint; 
for  example,  all  parts  of  the  GPSG  projection  operation  can  be  arbitrarily 
bounded  by  (large)  constants,  in  which  case  the  projection  operation  could 
in  principle  be  performed  in  constant  time,  although  the  constant  could 
be  large  enough  to  prevent  any  physically  realizable  computer  from  ever 
calculating  the  projection  operation.  I  focus  on  linguistically  plausible  re¬ 
strictions  with  practical  consequences,  and  defend  the  anointed  restrictions 
in  the  next  section. 


4.2  Category  Projection 

To  understand  how  FCRs  and  FSDs  affect  the  cost  of  projecting  ID  rules  to 
local  trees,  we  define  the  category  projection  problem  as  follows.  Let  Feat 
be  a  set  of  feature-names,  Atom  a  set  of  atomic-valued  feature-names,  A  a 
set  of  atomic  feature-values,  and  p  a  function  from  Atomto  A.  Then  the 
category  projection  problem  is: 

Given  a  specification  (Feat,  Atom,  A,  p)  for  a  set  K  of  syntactic 
categories  and  a  set  R  of  FCRs,  is  a  category  C  or  any  legal 
extension  of  C  in  the  set  K1 

This  problem  will  help  us  understand  the  effects  of  FCRs  on  the  GPSG 
projection  operation. 
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Alternately,  if  we  were  not  interested  in  ID  rule  projection  or  about  how 
FCRs  interact  with  the  rest  of  the  GPSG  formal  system,  we  might  study  the 
trivial  category  verification  problem ,  which  is  to  decide  if  a  given  category  C 
satisfies  a  given  set  of  FCRS,  as  Gazdar,  Pullum,  Carpenter,  Klein,  Hukari, 
and  Levine  (1988)  have  done. 

Theorem  1  Category  Projection  is  PSPACE-hard. 

Proof.  GPSG  categories  can  easily  be  understood  as  trees.  The  atomic¬ 
valued  features  in  a  category  represent  a  node  in  the  tree,  and  a  category  C 
dominates  its  embedded  categories — that  is,  C  immediately  dominates  all 
categories  C'  such  that  for  some  category- valued  feature  /,  C(f)  =  C' .  For 
example,  the  GPSG  category 

F2[AGR  S/NP[3p]  3 /NP[3s'] 


corresponds  to  the  tree 


NPl  3p] 


In  our  reduction,  then,  we  can  use  categories  to  represent  a  binary 
branching  AND/OR  computation  tree.  0-level  categories  represent  the  nodes 
of  the  computation  tree:  atomic- valued  features  /i,  fit  •  •  •  ,  fn  encode  the  ma¬ 
chine  state,  tape  contents,  head  position,  branching  type,  and  truth  value. 
Category- valued  features  encode  domination  in  the  computation  tree:  the 
two  category- valued  features  LEFT:  and  RIGHTt  represent  the  left  and  right 
branches  of  the  computation  tree  at  level  i.  For  example,  in  the  entire 
category  C\  constructed  by  our  reduction,  the  top-level  0-category  encodes 
the  root  node  of  the  computation  tree,  and  the  the  values  C\  (LEFT1 )  and 
Ci(llIGHTl/  encode  the  left  and  right  branches,  respectively,  of  the  root  node 
(initial  configuration).  Although  the  category  C\  is  undefined  for  any  other 
category- valued  features,  the  immediately  embedded  categories  C\  /LEFT1 ) 
and  Ci(klGHTl/  are  each  defined  for  exactly  two  category- valued  features: 
LEFT2  and  RIGBT2.  Continuing  in  this  manner,  we  will  need  exactly  two 
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category- valued  features  for  each  level,  to  represent  the  two  of  subtrees  im¬ 
mediately  dominated  by  each  node  at  a  given  level: 

{/li  fit '  •  •  i  /«» 

[LEFT1  {/i, 

[LEFT2 

[RIGHT2  {...}]}], 

[RIGHT1  {/i,/2,...,/n, 

[LEFT2 

[RIGHT2  {...}]}]} 

FCRs  maintain  the  next-move  relation  between  an  embedded  category  C, 
at  level  i  and  the  two  categories  it  immediately  contains  CifLEFTi)  and 
Ci  (RlGHTi )  and  calculate  the  truth  labelings  of  internal  nodes  according  to 
branching  type. 

In  order  for  such  a  reduction  to  be  polynomial  time,  we  must  be  able  to 
write  down  a  a  set  of  FCRs  and  to  specify  the  set  K  of  syntactic  categories  in 
polynomial  time.  To  specify  K ,  we  write  down  a  function  p  and  the  sets  Feat 
of  features,  Atom  of  atomic-valued  features  and  A  of  atomic  feature  values. 
The  restriction  to  polynomial  time  reductions  means  we  can  only  construct 
a  polynomial  number  of  features:  because  atomic  features  represent  configu¬ 
ration  size,  we  can  only  represent  polynomially-sized  configurations;  because 
category-valued  features  represent  depth  and  the  finite  feature  closure  re¬ 
striction  prevents  a  category-valued  feature  from  dominating  itself,  we  can 
only  represent  polynomially  deep  computation  trees.  Therefore,  we  can  use 
category  projection  to  simulate  any  polynomial  size  and  depth  AND/OR 
computation  tree:  the  category  projection  problem  is  PSPACE-hard.  Q 

Let  us  now  restrict  our  attention  to  0-level  categories. 

Theorem  2  0-Level  Category  Projection  is  NP-hard. 

Proof.  In  order  to  prove  this  theorem  2,  we  must  hide  an  entire  depth  d(n) 
OR  computation  tree  in  a  0-level  category.  Rather  than  use  the  entire  set 
Atom  to  encode  one  node  of  the  computation  tree  (a  Turing  machine  config¬ 
uration)  as  above,  we  will  use  the  set  Atom  to  encode  the  entire  pruned  OR 
computation  tree.  To  do  this,  we  partition  the  set  Atom  into  d(n)  subsets 
Atomi,Atomj,...  ,Atomj(n),  each  of  size  d(n ).  (Recall  that  the  nodes  of  a 
computation  tree  can  be  no  larger  than  the  depth  of  that  tree.)  The  atomic 
features  in  Atom*  will  encode  the  itK  node  of  the  computation  tree.  Each  fully 
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specified  O-level  category  represents  a  polynomial  number  of  polynomial  size 
Turing  machine  configurations,  instead  of  merely  one  such  configuration  as 
in  the  preceeding  proof.  This  can  be  done  in  polynomial  time  because  a 
polynomial  times  a  polynomial  is  also  a  polynomial.  Finally,  a  polynomial 
number  of  disjunctive  consequence  FCRs  relate  the  features  Atom,  repre¬ 
senting  node  *  to  the  features  Atom<+i  representing  the  successor  node  t  +  1 
according  to  the  next-move  relation  of  the  OR  computation  tree. 

In  such  a  manner  it  is  possible  to  hide  an  entire  pruned  OR  computation 
tree  in  a  0-level  category,  and  prove  that  the  0-level  category  projection 
problem  is  NP-hard.  Q 

Note,  however,  that  the  proof  of  this  theorem  is  much  clearer  when 
the  NP-complete  Satisfiability  problem  is  used  instead  of  a  node-bounded 
computation  tree.  Ristad  (1986)  contains  such  a  proof. 


Restricting  the  Theory  of  Syntactic  Features.  As  we  just  saw,  the 
primary  computational  resource  provided  by  the  theory  of  syntactic  features 
is  polynomial  space.  This  arises  from  finite  feature  closure,  which  generates 
a  surprisingly  large  number  of  possible  syntactic  categories.  Ristad  (1986) 
observes  that  even  if  all  atomic-valued  features  are  restricted  to  be  binary¬ 
valued,  finite  feature  closure  admits  0(3ob!)  GPSG  categories  where  a  is 
the  number  of  atomic- valued  features  and  b  the  number  of  category- valued 
features.  In  fact,  there  are  more  that  10  775  categories  in  the  GKPS  system. 

Fortunately,  the  full  power  of  embedded  categories  does  not  appear  to  be 
linguistically  necessary  because  no  category- valued  feature  need  ever  contain 
another  in  an  ID  rule.  To  be  precise,  although  a  category- valued  feature  / 
may  appear  inside  another  category-valued  feature  g  in  the  parse  tree  of 
some  utterance  in  some  language,  /  will  never  be  required  to  appear  inside 
g  in  a  rule  of  any  natural  language  grammar.  In  GPSG,  there  are  four 
category-valued  features:  SLASH,  which  marks  the  path  between  a  gap  and 
its  filler  with  the  category  of  the  filler;  AGR,  which  marks  the  path  between 
an  argument  and  the  functor  that  syntactically  agrees  with  it  (between  the 
subject  and  matrix  verb,  for  example);  WH,  which  marks  the  path  between 
a  wh-word  and  the  minimal  clause  that  contains  it  with  the  morphological 
type  of  the  tu/i-word;  and  RE,  which  marks  the  path  between  an  anaphor  and 
its  antecedent  with  the  category  of  the  anaphor. 

No  category- valued  feature  /  need  ever  contain  a  category- valued  fea- 
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ture  g  because  (1)  for  foot  features,  the  path  that  g  marks  need  never  be 
extended  by  the  path  that  /  marks:  g  could  just  as  well  cover  a  longer  path 
containing  both  it’s  former  path  and  the  path  of  /,  and  (2)  for  the  head 
feature  AGR,  the  path  that  /  marks  need  never  be  dependent  on  the  path  a 
foot  feature  g  marks.  Specifically,  RE  will  not  contain  a  category- valued  fea¬ 
ture,  because  the  value  of  RE  is  the  category  of  an  anaphor  and  anaphors  are 
nominals  without  internal  phrase  structure.  AGR  will  never  contain  SLASH 
or  RE  because  a  functor  (verb  or  predicate)  will  never  select  a  gap,  a  con¬ 
stituent  containing  a  gap,  or  a  category  that  must  be  an  antecedent  to  some 
unknown  anaphor  as  its  argument.  SLASH  need  never  contain  RE  because  the 
path  between  a  gap  and  it’s  filler  is  never  dependent  on  the  path  between 
an  anaphor  and  its  antecedent;  SLASH  will  never  be  required  to  contain  AGR 
because  such  a  category  corresponds  to  “the  following  imaginary  (and  rather 
weird)  case:  Suppose  we  found  a  language  in  which  finite  verb  phrases  could 
be  fronted  over  an  unbounded  domain  provided  that  they  were  in  the  agree¬ 
ment  form  associated  with  third-person-singular  NP  controllers”  (Pullum, 
personal  communication).  Finally,  because  the  value  of  WH  is  the  category 
of  a  wh-  noun  phrase,  and  because  wh-  nominals  are  never  required  to  con¬ 
tain  gaps,  WH  need  never  contain  SLASH  or  AGR.  In  point  of  fact,  no  category 
embeddings  appear  in  the  GKPS  grammar  for  English,  and  it  is  difficult  to 
see  why  they  would  be  necessary  for  any  other  natural  language.6 

Let  us  now  explicitly  adopt  the  strategy  of  restricting  computational 
costly  devices  in  the  absence  of  direct  linguistic  counterevidence.  This  strat¬ 
egy  will  result  in  the  most  constrained  and  falsifiable  theory.  The  obvious 
revision,  then,  is  unit  feature  closure:  to  limit  category- valued  features  to 

*The  central  apparent  counterexample  to  these  arguments  is  cases  of  multiple  extrac¬ 
tions  (see  footnote  3  above).  Both  GPSG  and  RGPSG  must  be  modified  in  ordei  to  handle 
this  phenomenon.  One  way  is  to  abandon  closure  constraints  on  category  embeddings,  in 
order  to  allow  SLASH  to  take  as  its  value  a  category  specified  for  SLASH  (so-called  “recursive 
SLASH").  Such  a  change  would,  in  my  opinion,  be  disastrous  because  it  abrogates  feature 
closure,  a  central  descriptive  and  computational  constraint  in  the  GPSG  category  system. 
The  revised  principles  of  UF1  needed  to  constrain  recursive  SLASH  would,  I  suspect,  be 
very  tricky  to  state.  A  simpler  approach  is  to  allow  SLASH  to  take  a  strictly  bounded 
sequence  of  values,  where  a  category  preceeds  all  categories  in  the  sequence  if  and  only  if 
its  extraction  path  properly  contains  their  extraction  paths.  Then  the  revised  FFP  might 
enforce  the  path  containment  condition  by  limiting  ID  rules  to  only  affecting  (adding  or 
removing)  the  last  category  in  the  SLASH  sequence  on  a  local  tree.  (That  is,  feature  instan¬ 
tiation  in  ID  rule  projection  can  only  add  categories  to  the  front  of  the  SLASH  sequence.) 
A  still  simpler,  although  seemingly  less  elegant,  approach  is  to  add  a  new  foot  feature,  say 
SUBSLASH,  to  encode  an  extraction  path  contained  inside  the  extraction  path  marked  by 
SLASH. 
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containing  only  0-level  categories.  This  revision  makes  categories  and  0- 
categories  equivalent  for  the  polynomial  time  reductions  of  computational 
complexity  theory. 


Restricting  Marking  Conventions.  Satisfying  FCRs  and  FSDs  in  the 
GPSG  category  projection  process  requires  significant  computational  re¬ 
sources.  First,  each  allow  the  projection  process  to  reuse  the  polynomial 
space  provided  by  the  theory  of  syntactic  features,  because  each  can  estab¬ 
lish  equivalences  between  the  features  in  a  category  C  and  the  features  in  a 
category  embedded  in  C.  This  ability  to  apply  across  embedded  categories 
vastly  increases  the  complexity  of  the  rule-to-tree  projection.  To  see  why 
it  is  linguistically  unnecessary,  consider  the  role  of  embedded  categories.  A 
category-valued  feature  /  expresses  a  nonlocal  linguistic  relation  between  a 
category  C  and  the  one  or  more  connected  categories  that  bear  the  feature 
specification  [/  C] .  Thus,  in  the  linguistically  relevant  cases,  every  embed¬ 
ded  category  eventually  “surfaces”  in  phrase  structure,  where  the  marking 
conventions  are  free  to  apply.  The  one  exception  to  this  argument  is  FCR 
13  in  the  GKPS  grammar  for  English,  which  applies  ‘across’  an  embedded 
category. 


FCR  13:  [FII,  AGE  NP]  D  [AGR  NPl I0K]] 

In  RGPSG,  marking  conventions  may  not  apply  to  or  across  embedded  cat¬ 
egories.  The  effect  of  FCR  13  is  achieved  in  RGPSG  by  a  combination  of 
carefully  written  ID  rules  and  the  simple  default  SD  2  in  section  6.2  below. 

Second,  FCRs  and  FSDs  of  the  “disjunctive  consequence”  form  [/  v]  D 
C/i  Vi]  V  •  •  •  V  Ifn  vn]  allow  us  to  simulate  the  next-move  relation  of  an 
OR  computation  tree;  from  another  perspective  they  compute  the  direct 
analog  of  the  NP-complete  Satisfiability  problem:  when  several  such  FCRs 
are  used  together,  the  GPSG  must  nondeterministically  try  all  n  feature- 
value  combinations. 

Third,  the  process  of  applying  feature  specification  defaults  to  local 
trees  is  very  complex,  in  part  because  it  is  not  informationally  encapsu¬ 
lated.  Rather  than  simply  considering  the  (existing)  feature  specifications 
in  each  target  category  separately,  FSD  application  is  affected  by  the  other 
categories  in  the  ID  rule,  all  principles  of  universal  feature  instantiation,  and 
even  FCRs. 
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There  is  no  reason  to  believe  that  marking  conventions  need  be  as  pow¬ 
erful  and  unconstrained  as  FCRs  and  FSDs.  The  approach  RGPSG  takes 
is  to  virtually  eliminate  marking  conventions.  Rather  them  stating  the  in¬ 
ternal  constraints  on  categories  explicitly  (and  redundantly),  as  FCRs  do, 
RGPSG  eliminates  FCRs  altogether.  Instead,  the  constraints  FCRs  express 
are  implicitly  stated  in  the  rest  of  the  grammar  —  in  the  way  ID  rules  and 
metarules  are  written,  for  example. 

Reducing  the  power  of  marking  conventions  and  other  grammatical  de¬ 
vices  might  appear  to  hinder  the  grammar  writer.  But  we  help  the  grammar 
writer  understand  the  consequences  of  a  grammar  by  reducing  the  complex¬ 
ity  of  grammatical  derivations:  if  a  supercomputer  cannot  possibly  deter¬ 
mine  membership  in  the  language  of  an  intractable  grammar,  then  surely  no 
human  grammar  writer  can — see  appendix  A. 3  for  empirical  confirmation. 
Evans  (1985:213)  observes  exactly  this  practical  consequence  of  GPSG’s  the¬ 
oretical  intractability:  “The  GPSG  theory  is  complex  enough  that  ensuring 
that  any  but  a  small  grammar  behaves  as  expected  is  a  difficult  task.” 

4.3  Metarule  Finite  Closure 

In  the  finite  closure  membership  problem  for  GPSG  metarules,  we  are  given 
an  ID  rule  r,  a  set  M  of  metarules,  and  a  set  R  of  ID  rules.  The  problem  is 
to  decide  whether  r  is  in  FC(M,R).  This  subproblem  of  GPSG  projection 
will  allow  us  to  pinpoint  the  contribution  of  metarules  to  the  complexity  of 
ID  rule  projection. 

Theorem  3  Metarule  Finite  Closure  Membership  is  NP-hard. 

Proof.  The  insight  underlying  the  following  reduction  is  that  the  finite 
closure  operation  specifies  an  OR  computation  tree  whose  nodes  are  ID 
rules,  where  metarules  enforce  the  next -move  relation  between  adjacent 
nodes.  The  polynomial- time  reduction  restriction  limits  us  to  construct¬ 
ing  at  most  a  polynomial  number  of  metarules,  while  metarule  finite  closure 
restricts  each  metarule  to  at  most  one  application  in  a  given  computation 
tree:  metarules  by  themselves  are  only  capable  of  simulating  a  polynomial 
depth  computation  tree.  The  finite  closure  operation  on  metarules  gives 
the  GPSG  formal  system  the  power  of  nondeterminism  (OR  branching)  be¬ 
cause  all  possible  permutations  of  metarules  are  applied;  using  this  power, 


we  can  use  a  metarule  system  to  simulate  any  polynomial  depth  OR  com¬ 
putation  tree.  Therefore,  the  metarule  finite  closure  membership  problem  is 
NP-hard.  Q 

Restricting  Metarules.  Metarule  finite  closure  generates  many  linguisti¬ 
cally  incorrect  ID  rules  that  must  be  excluded  by  other  GPSG  devices,  such 
as  FCRs.  For  example,  the  result  of  applying  the  Extraposition,  Passive, 
and  Subject-Aux  Inversion  metarules  in  order  to  the  lexical  ID  rule  2 

FP[AGR  S]  -  NP  (2) 

is  the  lexical  ID  rule  3, 

SOIHV,  PAS,  AGR  JVPCit]]  -  H[20],  5,  NP,  (PP[by] )  (3) 

which  does  not  generate  sentences  in  the  English  language  and  (thankfully) 
is  excluded  by  FCR  1. 

The  GKPS  grammar  for  English  contains  six  metarules;  out  of  approxi¬ 
mately  1944  possible  metarule  interactions  in  principle,  only  two  such  inter¬ 
actions  appear  to  be  productive  (passive  followed  by  subject-aux  inversion 
or  slash  termination  metarule  l).7  Therefore,  the  second  metarule  restric¬ 
tion  adopted  by  RGPSG  is  biclosure ,  instead  of  finite  closure.  Alternately, 
we  might  restrict  metarules  to  unit  closure,  or  follow  Pollard  (1985)  in  elim¬ 
inating  metarules  altogether. 

Lacking  an  alternate  theory  of  lexical  redundancy  rules,  honesty  compels 
us  to  include  metarules  in  RGPSG  in  some  form.  How  then  can  we  choose 
between  unit  closure  and  biclosure  on  principled  grounds?  Metarule  bido- 
sure  does  not  overgenerate  as  badly  as  finite  closure,  and  thereby  promotes 
descriptive  adequacy  at  the  expense  of  some  explanatory  power.  Biclosure 
has  an  edge  in  descriptive  economy  over  unit  closure  because  simpler  (and 
fewer)  metarules  are  needed  with  biclosure  than  with  unit  closure.  Thus, 
although  the  length  of  metarule  derivations  is  not  subject  to  direct  empirical 

'Given  %  set  of  n  metarule*,  the  number  of  possible  metarule  interactions  is  the  number 
of  ways  to  pick  n  or  less  metarules  from  the  set,  where  order  matters  and  repetitions  are 
not  allowed.  That  number  is  given  by  the  total  number  of  possible  ^-selections  from  the 
n  metarules,  where  k  varies  from  0  (no  metarules  apply)  to  n  (any  combination  of  all 
metarule*  apply).  Thus,  the  number  of  possible  interactions  /(n)  is:  ~  ■ e ). 

Note  that  this  is  not  the  sue  of  metarule  finite  closure,  because  it  does  not  consider  the 
possibility  of  a  metarule  matching  an  ID  rule  in  more  than  one  way. 
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evidence,  it  is  not  entirely  ad  hoc  because  it  is  subject  to  the  scientific  crite¬ 
rion  of  descriptive  economy,  descriptive  adequacy,  and  explanatory  power. 

4.4  Unordered  Local  Tree  Recognition 

The  universal  recognition  problem  for  unordered  local  trees  is  to  decide  if 
a  given  string  x  earn  be  derived  from  a  set  of  unordered  local  trees  P.  This 
is  equivalent  to  the  unordered  context-free  grammar  recognition  problem 
considered  by  E.  Barton  in  BBR. 

Theorem  4  Unordered  Local  Tree  Recognition  is  NP- complete. 

This  is  the  only  theorem  we  will  not  prove  here  using  our  typology  of 
computation  trees.  Barton  shows  how  the  multiset  RHS  of  an  ID  rule  con¬ 
tributes  to  an  exponentially  large  space  of  local  phrase  structure  trees:  an  ID 
rule  with  a  a  RHS  of  cardinality  b  can,  if  unconstrained  by  LP  statements, 
correspond  to  6!  ordered  productions.  In  parsing  practice,  this  can  cause  a 
combinatorial  explosion  in  a  context-free  parser’s  state  space.  In  addition 
to  causing  nondeterminism  in  any  GPSG-based  parser,  the  multiset  RHS 
confers  on  GPSG  the  ability  to  count  nonterminals.  The  apparent  artificial¬ 
ity  of  this  device,  as  discussed  in  BBR  (pp. 260- 261),  will  motivate  RGPSG 
to  adopt  a  substantive  constraint  of  short  ID  rules  (binary  branching,  for 
example).8 


4.5  GPSG  Recognition 

The  ultimate  problem  we  analyze  is  the  universal  recognition  problem  for 
GPSGs:  given  a  GPSG  and  a  string,  is  the  string  in  the  language  of  the 
GPSG?  (Recall  that  the  URP  was  chosen  as  the  problem  statement  that 
best  characterized  the  overall  power  of  a  grammar.) 

*The  binary  branching  constraint  is  independently  motivated  by  the  linguistic  argu¬ 
ments  of  Kayne  (1981)  and  others.  In  that  work,  Kayne  argues  that  the  path  from  a 
governed  category  to  its  governor  (for  example,  from  an  anaphor  to  its  antecedent)  must 
be  unambiguous — informally  put,  “an  unambiguous  path  is  a  path  such  that,  in  tracing 
it  out,  one  is  never  forced  to  make  a  choice  between  two  (or  more)  unused  branches,  both 
pointing  in  the  same  direction”  (Kayne  1981:146).  The  unambiguous  path  requirement 
sharply  constrains  fan-out  in  phrase  structure  trees  because  n-ary  branching,  for  n  >  2,  is 
only  possible  when  none  of  the  n  sister  nodes  must  govern  any  other  nodes  in  the  phrase 
structure  tree. 
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Theorem  5  GPSG  Recognition  is  EXP-POLY-hard. 


Proof.  The  idea  of  this  reduction  is  to  transparently  encode  a  pruned 
polynomial  space  AND/OR  computation  tree  in  a  GPSG  parse  tree.  Every 
node  in  the  computation  tree  will  be  represented  by  a  category  in  the  GPSG 
parse  tree.  The  reduction  preserves  the  invariant  that  GPSG  category  can  be 
terminated  if  and  only  if  the  configuration  that  it  represents  can  be  labeled 
true  in  the  pruned  computation  tree. 

As  before,  0- level  categories  encode  nodes  of  the  computation  tree,  which 
are  Turing  machine  configurations.  Local  trees  represent  the  pruned  next- 
move  relation  between  nodes:  a  local  tree  with  one  daughter  represents  a 
pruned  OR  node,  while  a  local  tree  with  two  daughters  represents  a  pruned 
binary-branching  AND  node.  The  leaves  of  the  pruned  computation  tree 
have  halted,  accepting  configurations;  these  accepting  nodes  are  represented 
by  a  local  tree  with  no  daughters.  There  are  no  lexical  entries  in  this  GPSG, 
and  therefore  the  only  way  to  terminate  a  category  in  this  GPSG  is  with 
such  a  “null  transition.”  Thus,  the  GPSG  parse  tree  will  terminate  in  a  very 
long  empty  string. 

Now  we  must  show  how  such  an  exponentially  large  parse  tree  can  be 
specified  in  polynomial  time.  The  reduction  must  first  list  enough  atomic 
features  to  represent  the  largest  node  in  the  computation  tree;  this  is  pos¬ 
sible  because  the  size  of  each  node  is  bounded  by  a  polynomial,  as  is  the 
reduction.  We  will  not  be  able  to  write  down  all  the  local  trees  required  in 
polynomial  time,  because  there  are  an  exponential  number  of  them.  (In  fact, 
approximately  c3  local  trees  are  needed,  where  c  is  the  number  of  possible 
configurations,  which  we  know  to  be  exponential  in  the  polynomial  size  of 
the  configurations.) 

Instead,  we  will  use  ID  rules  to  encode  the  alternating  Turing  machine 
transition  relation  £,  which  is  infinitely  smaller  than  the  corresponding  next- 
move  relation.  Recall  that  6  is  a  relation  between  a  tuple  and  a  triple:  the 
tuple  contains  a  machine  state  and  currently  scanned  tape  symbol,  while 
the  triple  contains  a  new  machine  state,  a  new  symbol  to  write  on  the 
tape,  and  a  direction  to  move  the  read/write  head.  The  next- move  relation 
is  a  relation  between  two  configurations  that  obey  the  6  relation.  Each 
transition  in  6  licenses  infinitely  many  next-move  relations  between  nodes  of 
the  computation  tree  because  6  does  not  care  about  tape  squares  that  the 
machine  is  not  currently  scanning.  For  every  binary  OR  transition  licensed 
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by  S,  we  will  build  two  nonbranching  ID  rules  C  — »  C\  and  C  — >  C2,  one 
for  each  of  the  two  possible  pruned  OR  transitions  (recall  that  a  pruned 
OR  branch  is  a  straight  line).  For  every  binary  AND  transition  found  in  S, 
we  will  build  a  branching  ID  rule  C  — >  C\,  C2-  Therefore,  an  OR  category 
can  be  terminated  iff  one  of  it’s  possible  daughters  can  be,  while  an  AND 
category  can  be  terminated  iff  all  of  its  daughters  can  be.  This  corresponds 
exactly  to  the  labeling  rules  for  an  AND/OR  computation  tree:  an  OR 
node  is  labled  true  iff  one  of  its  possible  daughters  is,  while  an  AND  node 
is  labelled  true  iff  fill  of  its  daughters  are. 

Next,  we  add  an  lone  ID  rule  Caeetpt  — *  £  to  terminate  nodes  representing 
halted,  accepting  configurations  with  the  empty  string.  Because  there  are  no 
lexical  entries  in  this  GPSG,  the  only  categories  that  can  be  terminated  are 
those  that  represent  nodes  that  have  been  labeled  true  in  the  computation 
tree. 

The  final  step  is  to  make  the  atomic  features  that  represent  the  tape 
contents  be  head  features,  and  insist  all  daughters  be  heads.  An  ID  rule 
that  encodes  a  ^-transition  will  then  project  into  the  local  trees  that  rep¬ 
resent  all  possible  next-moves  licensed  by  that  ^-transition.  The  head  fea¬ 
ture  conventl„i\  which  governs  the  projection  of  ID  rules  into  local  trees, 
will  ensu^  ♦Vu  tape  squares  not  altered  by  the  tape- writing  activity  spec¬ 
ified  by  the  ^-transition  will  be  identical  on  the  mother  and  all  daughters 
in  the  projected  local  trees.  In  this  fashion  we  can  use  a  GPSG  to  simu¬ 
late  any  unbounded  depth  polynomial  space  AND/OR  computation  tree. 
Therefore,  GPSG  Recognition  is  EXP-POLY  time-hard  (details  are  in  ap¬ 
pendix  A.l).  □ 

What  are  the  implications  of  this  result  for  GPSG  and  natural  language? 
At  first  glance,  it  is  unclear  whether  we  have  exposed  an  oversight  in  the 
way  GPSG  was  formalized  (and  if  so,  how  easily  may  it  be  remedied?)  or  an 
inherent  property  of  natural  language  grammars.  Equally  unclear  is  whether 
the  intractability  arises  in  practice  or  is  merely  an  artifact  of  the  complexity- 
theoretic  idealization  to  unbounded  inputs.  But  first  we  reconcile  this  result 
with  the  fact  that  context-free  languages  may  be  recognized  in  polynomial 
time. 
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4.5.1  Interpreting  the  Result 


At  first  glance,  a  proof  that  GPSG  Recognition  is  EXP-POLY  hard  appears 
to  contradict  the  fact  that  context-free  languages  can  be  recognized  in  0(n3) 
time  by  a  wide  range  of  algorithms.  To  see  why  there  is  no  contradiction,  we 
must  first  explicitly  state  the  argument  from  weak  context-free  generative 
power,  which  we  will  call  the  efficient  processability  (EP)  argument. 


The  Efficient  Processability  Argument.  The  main  thrust  of  the  EP 
argument  runs  as  follows: 

•  Any  GPSG  can  be  converted  into  a  weakly  equivalent  context-free 
grammar  (CFG). 

•  CFG  recognition  can  be  accomplished  in  polynomial  time. 

•  Therefore,  GPSG  recognition  can  also  be  accomplished  in  polynomial 
time. 

The  argument  continues: 

•  If  the  conversion  is  fast,  then  GPSG  recognition  is  fast.  However,  even 
if  the  conversion  is  slow,  recognition  with  the  “compiled”  CFG  will  still 
be  fast;  we  may  justifiably  lose  interest  in  doing  recognition  with  the 
original,  slow  GPSG. 

The  EP  argument  is  misleading  because  it  ignores  both  the  effect  con¬ 
version  has  on  grammar  size  and  the  effect  grammar  size  has  on  recognition 
speed.  Crucially,  grammar  size  affects  recognition  time  in  all  known  CFG 
recognition  algorithms.  The  only  grammars  directly  usable  by  context-free 
parsers — hence  the  only  grammars  for  which  rapid  parsing  results  carry 
over — are  those  composed  of  context-free  productions  with  atomic  nonter¬ 
minal  symbols.  For  GPSG,  this  corresponds  to  the  set  of  admissible  local 
trees,  and  this  set  is  astronomical.  Ignoring  the  effects  of  ID/LP  format,  it 
is 

0(3m!m’m+l)  (4) 

in  a  GPSG  G  containing  m  symbols  (see  BBR  for  details). 

This  worst-case  formula  for  the  size  of  the  “expanded”  grammar  is  vin¬ 
dicated  in  practice.  Phillips  and  Thompson  (1985:252)  observe  that  in  their 
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parser  based  on  the  GPSGs  of  Gazdar  (1982),  “To  expand  the  [GPSG]  gram¬ 
mar  completely  ...  would  be  ridiculously  wasteful  of  space  and  time.  (The 
toy  grammar  of  English  we  use  with  GPSGP  [their  parser],  of  29  phrase- 
structure  rules  and  four  metarules,  which  expands  to  85  rules,  is  equivalent  to 
severed  tens  of  millions  of  context-free  rules.)”  Similarly,  Shieber  (1983:137) 
notes  that  typical  post-Gazdar  (1982)  GPSG  systems  contain  “literally  tril¬ 
lions”  of  derived  rules.  In  appendix  A. 2,  I  estimate  that  the  GKPS  grammar 
for  English  contains  more  than  1033  admissible  local  trees. 


Consequences  for  GPSG  Parsing.  The  Earley  recognizer  for  context- 
free  grammars  runs  in  time  Q{\G'\2  -n3)  where  |G'|  is  the  size  of  the  CFG  G1 
and  n  the  input  string  length,  so  a  GPSG  G  of  size  m  will  be  recognized  in 
time 

0(32-m!m3m+1  .n3)  (5) 

The  hyperexponential  term  will  dominate  the  Earley  algorithm  complex¬ 
ity  in  the  reduction  above  because  m  is  a  function  of  the  size  of  the  ATM 
we  are  simulating.  Even  if  the  GPSG  is  held  constant,  the  stunning  de¬ 
rived  grammar  size  in  formula  4  turns  up  as  an  equally  stunning  “constant” 
multiplicative  factor  in  5,  which  in  turn  will  dominate  the  real-world  perfor¬ 
mance  of  the  Earley  algorithm  for  all  expected  inputs  (that  is,  any  that  can 
be  written  down  in  the  universe),  every  time  we  use  the  derived  grammar. 
This  class  of  hyperexponential  functions  cn"  grows  at  a  frightening  rate — in 
the  mathematical  worst  case,  if  a  GPSG  with  2  symbols  recognized  a  given 
sentence  in  .001  second,  a  grammar  with  3  symbols  would  recognize  the 
same  sentence  in  2.5  hours,  and  a  grammar  with  a  mere  4  symbols  could 
take  at  least  1063  centuries. 

GPSG’s  intractability  appears  in  GPSG-based  parsers  in  two  ways,  strongly 
suggesting  that  the  GPSG’s  intractability  is  not  an  artifact  of  complexity 
theory.  First,  many  GPSG-based  parsers  appear  to  be  infamously  slow. 
For  example,  Evans  (1985:237)  experiences  the  real-world  intractability  of 
GPSG-Recognition  first  hand  in  his  GPSG-based  parser,  and  proposes  to 
manage  it  by  eliminating  lexical  ambiguity  and  by  keeping  both  grammar 
and  input  string  size  as  small  as  possible:  “The  attempts  to  overcome  the 
time  and  space  problems  have  only  been  partially  successful  ....  The  only 
remedies  seem  to  be,  keep  phrases  as  short  as  possible  (for  example,  do  not 
try  to  test  large  noun  phrases  inside  complex  sentences  if  it  can  be  avoided — 
use  proper  nouns  instead),  make  sure  no  words  are  duplicated  in  the  lexicon, 
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keep  the  number  of  ID  rules  currently  loaded  down  where  possible  . . . 

Second,  I  know  of  no  faithful  implementation  of  GKPS.  As  we  just  saw, 
there  are  too  many  possible  local  trees  for  any  computer  to  explicitly  cal¬ 
culate  GPSG  projection,  which  means  that  parsers  must  project  ID  rules 
on  the  fly.  But  we  know  from  theorems  1-4  that  the  GKPS  projection 
operation  cannot  even  be  computed  in  practice,  due  to  the  complexity  of 
metarules,  marking  conventions,  embedded  categories,  ID  rules,  and  ex¬ 
ceptional  feature  specifications.  Thus  it  will  not  be  possible  to  faithfully 
project  ID  rules  on  the  fly,  in  part  because  not  all  extensions  of  ID  rule 
categories  are  legal.  For  example,  the  categories  KP[INV  +.VF0RM  PAS] 
and  VT[IHV  +.VF0RM  FIN]  sire  not  legal  extensions  of  VP  in  English  due 
to  FOR  1  while  VP [INV  +  .AUX  +.VF0RM  FIN]  is.  But  even  if  we  ignore 
the  significant  computational  complexity  introduced  by  syntactic  features, 
marking  conventions,  ID/LP  format,  null-transitions,  and  metarules,  the 
GPSG  recognition  and  projection  problems  will  still  be  intractable.  This  is 
because  the  head  feature  convention  alone  allows  GPSG  parse  trees  to  sim¬ 
ulate  unbounded  depth  polynomial  space  nonbranching  computation  trees: 
the  recognition  problem  for  these  impoverished  GPSGs  would  be  PSPACE- 
hard  and  still  thought  to  be  intractable.  (This  result  should  not  surprising, 
because  the  HFC  in  current  GPSG  theory  replaces  some  metarules  in  earlier 
versions  of  GPSG  and  metarules  are  known  to  cause  intractability.)  Because 
no  faithful  implementation  of  GKPS  is  even  possible  in  practice,  computa¬ 
tional  linguists  have  no  choice  but  to  in  effect  invent  their  own  version  of 
GPSG  theory  to  implement.9 

I  am  not  impugning  any  of  these  GPSG-based  parsing  systems.  Rather, 
I  am  arguing  that  GPSG’s  theoretical  intractability  is  not  an  artifact  of 
complexity  theory  because  it  appears  in  the  real  world,  in  natural  language 
parsers  based  on  GPSG  theory,  which  will  be  as  slow  as  they  are  faithful  to 
GKPS. 

#One  such  parser  for  English  derived  from  GPSG  is  described  in  Harrison  and  Maxwell 
(1986),  who  claim  that  “parser  response  time  has  been  adequate  for  our  development 
purposet.”(p.lO)  However,  they  note  that  “in  the  presence  of  significant  ambiguity,  an 
all-paths  parser  such  as  this  one  can  experience  a  significant  degradation  [in]  response 
time.”(p.lO)  Their  parser  projects  local  trees  on  the  fly,  and  fails  to  implement  FCRs, 
FSDs,  metarules,  ID/LP  format,  and  the  CAP  in  any  form.  They  disallow  exceptional 
feature  specifications;  feature  instantiation  is  ad  hoc,  and  not  faithful  to  the  HFC  and  FFP 
of  GKPS.  Harrison  (personal  communication)  attributes  the  parser’s  adequate  response 
times  to  clever  programming  and  it's  departure  from  the  specifics  and  generality  of  the 
GKPS  formal  system  in  order  to  avoid  the  formal  excesses  of  GKPS. 
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As  we  shall  see  below,  GPSG’s  intractability  appears  to  be  due  partly 
to  oversights  in  it’s  formalization  and  partly  to  an  inherent  property  of 
natural  language  grammars.  The  fact  that  revised  GPSG  has  the  empirical 
coverage  of  GPSG,  and  is  only  NP-complete,  argues  that  GPSG’s  EXP- 
POLY-hardness  arises  from  the  particular  formal  choices  GKPS  made.  But 
the  fact  that  intractability  in  both  GPSG  and  RGPSG  arises  from  the  need 
to  account  for  the  very  real  linguistic  phenomenon  of  nonlocal  syntactic 
agreement  find  ambiguity  suggests  that  all  natural  language  grammars  may 
be  intractable  (more  below). 

4.5.2  Restricting  the  GPSG  formal  system 

The  proof  of  theorem  5  tells  us  that  we  must  further  restrict  the  GPSG 
formal  system,  in  both  projection  and  derivation,  in  order  to  achieve  com¬ 
putational  tractability.  Let  us  now  consider  how  that  might  be  done  without 
curtailing  GPSG’s  descriptive  economy  too  much. 

Restricting  ID/LP.  ID  rules  significantly  increase  the  time  resources  re¬ 
quired  by  the  GPSG  derivation  process  in  three  related  ways.  First,  a  deriva¬ 
tion  step  is  nondeterministic  because  a  category  may  immediately  dominate 
more  than  one  RHS.  Second,  the  derivation  process  may  alternate  between 
a  derivation  step  involving  the  ID  rules  C  -*  Ci  |  . . .  |  Ck  that  corresponds 
to  an  OR-transition  (only  one  of  k  possible  successors  must  yield  a  termi¬ 
nal  string)  and  a  derivation  step  involving  an  ID  rule  C  — ►  Ci,  C2, . . . ,  C* 
that  corresponds  to  an  AND-transition  (all  k  successors  must  yield  terminal 
strings).  These  two  devices  introduce  lexical  and  structural  ambiguity.  As  is 
well-known,  ambiguity  is  a  central  property  of  natural  languages.  Therefore, 
this  aspect  of  ID  rules  is  linguistically  essential,  and  it  will  be  retained  in 
RGPSG. 

Third,  unrestricted  null  transitions  in  ID  rules  are  a  source  of  intractabil¬ 
ity  because  they  allow  GPSGs  to  generate  enormous  phrase  structure  trees 
whose  yield  is  the  empty  string.  If  there  were  no  null  transitions  in  ID  rules, 
then  the  GPSG  formal  system  could  only  simulate  polynomial  breadth  com¬ 
putation  trees.  This  is  so  because  the  polynomial  time  reduction  can  only 
■write  down  polynomial  length  input  strings  z,  and  in  a  grammar  free  of  null 
transitions,  the  length  of  a  string  is  equal  to  the  breadth  of  its  parse  tree. 
Unrestricted  null  transitions  are  also  undesirable  according  to  classic  lin- 
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guistic  arguments,  because  GPSG  theory  with  unrestricted  null  transitions 
need  not  obey  recoverability  of  deletions  (ROD).  A  parser  that  used  such  a 
grammar  must  nondeterministically  postulate  elaborate  phrase  structure  in 
between  its  input  tokens. 

Although  unrestricted  null-transitions  violate  ROD  and  cause  unnatural 
computational  difficulties,  they  are  absolutely  needed  for  gaps:  the  RGPSG 
solution  is  to  greatly  restrict  null-transitions  by  strengthening  the  x-theory 
embodied  in  ID  rules. 

Restricting  Universal  Feature  Instantiation.  The  three  principles  of 
UFI  all  cause  intractability  because  they  allow  the  derivation  process  to  in 
effect  reuse  space  resources. 

First,  each  principle  of  UFI  can  enforce  nonlocal  feature  agreement  in 
phrase  structure.  Appendix  B.l  shows  how  this  causes  NP-hardness,  when 
coupled  with  lexical  ambiguity  or  null  transitions.  A  related  source  of  in¬ 
tractability  is  that  the  projection  of  ID  rules  to  local  trees  can  create  an  as¬ 
tronomical  space  of  local  trees,  which  in  turn  increases  parser  search  space. 
These  two  sources  of  intractability  cannot  be  eliminated  because  they  are 
essential  to  GPSG’s  account  of  linguistic  agreement  among  conjuncts  and 
between  predicates  and  their  arguments,  gaps  and  their  fillers,  anaphors  and 
their  antecedents,  and  phrases  and  their  lexical  heads. 

The  use  of  exceptional  feature  specifications  in  these  principles  allows  a 
derivation  to  reuse  the  space  resources  provided  by  the  ID  rules  and  theory  of 
syntactic  features.  In  the  EXP-POLY  reduction  above,  head  features  encode 
nodes  of  the  computation  tree.  The  HFC  is  used  to  transfer  the  tape  contents 
^  of  a  configuration  Co  (represented  by  the  mother)  to  its  immediate  successors 
Ci,  Ci,. ..  ,Ch  (the  head  daughters).  The  configurations  Co,Ci, . . . ,  Ck  have 
identical  tapes,  with  the  critical  exception  of  one  tape  square.  If  the  HFC 
enforced  absolute  agreement  between  the  head  features  of  the  mother  and 
head  daughters,  the  polynomial  space  AND/OR  computation  tree  could  not 
be  simulated  in  this  maimer. 


Restricting  Metarules.  Although  no  metarules  are  involved  in  the  EXP- 
POLY  reduction,  they  can  indirectly  increase  the  time  and  space  resources 
needed  by  the  derivation  process  by  introducing  null  transitions  and  ambi¬ 
guity  in  ID  rules. 
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As  we  noted,  unrestricted  null  transitions  are  both  linguistically  and 
computationally  undesirable.  Moreover,  the  ability  of  metarules  to  affect 
lexical  head  daughters  is  in  direct  conflict  with  their  linguistic  purpose:  “to 
express  generalizations  about  the  subcategorization  possibilities  of  lexical 
heads.”  (GKPS:59)  Unrestricted  metarules  can  destroy  the  relation  between 
a  phrase  and  its  lexical  head,  and  thereby  violate  x-theory.  Tve  first  step  in 
revising  metarules  is  to  restrict  them  to  only  affect  the  mother  and  nonhead 
daughters  in  lexical  ID  rules.  Because  of  this  change,  metarules  cannot  alter 
the  [BULL  -]  specification  that  appears  on  all  head  daughters  in  RGPSG 
ID  rules.  Therefore,  once  a  category  is  expanded  in  an  RGPSG  derivation, 
it  must  be  lexically  realized  in  the  derived  string.  This  formal  constraint 
ensures  that  the  empty  string  does  not  have  elaborate  phrase  structure  in 
RGPSG. 

4.0  Sources  of  Intractability  Summary 

Figure  5  summarizes  the  sources  of  intractability  we  have  uncovered  by  ap¬ 
plying  complexity  analysis  to  four  carefully  selected  computational  problems 
posed  within  the  GPSG  formal  system.  In  the  next  section,  I  present  re¬ 
vised  GPSG.  Of  the  more  than  ten  sources  of  intractability  linking  in  GPSG, 
only  two  remain  in  RGPSG  —  lexical  ambiguity  and  nonlocal  feature  agree¬ 
ment.  Critically,  these  two  sources  of  intractability  in  RGPSG  appear  to  be 
linguistically  essential  (see  Ristad  and  Berwick,  in  press). 
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Syntactic  Categories 
ID  Rules 


<q  Finite  feature  closure 


<3  Unrestricted  nulls  (gaps) 
c  Alternating  derivations 

(lexical  &  structural  ambiguity) 
Unbounded  multiset  RHS 

Metarules  «  Introduce  nulls  &  alternation 

<j  Finite  closure 

UFI  <a  Nonlocal  agreement 

<  Exceptional  feature  specification 

Marking  Conventions  <  Disjunctive  consequence 

(FCRs,  FSDs)  <j  Apply  across  embedded  categories 


Figure  5:  Sources  of  Intractability  in  GPSG 
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5  The  RGPSG  Formal  System 


The  revision  of  the  GPSG  formal  system  is  driven  by  the  desire  to  strengthen 
linguistic  principles  embodied  in  it  and  reduce  the  computational  resources 
it  uses  in  both  projection  and  derivation.  The  common  theme  of  the  pro¬ 
posed  restrictions  is  to  reduce  the  range  and  interaction  of  RGPSG’s  formal 
devices.  Specifically,  RGPSG  obeys  stricter  notions  of  X-theory,  recoverabil¬ 
ity  of  deletions,  and  permissible  extraction  domains  than  standard  GPSG 
does.  The  computational  restrictions  on  RGPSG  focus  on  bounding  the 
size,  depth,  breadth,  and  branching  type  of  the  computation  trees  that  can 
be  found  in  RGPSG’s  formal  devices. 

Recall  that  our  strategy  is  to  restrict  the  computational  power  of  a  for¬ 
mal  device  in  the  absence  of  linguistic  counterevidence.  As  we  shall  see,  our 
goal  of  an  efficient,  maximally  restricted,  descriptively  adequate  formal  sys¬ 
tem  is  frequently  at  odds  with  the  goal  of  a  simple  and  notationally  elegant 
formal  system.  Systems  with  the  simplest  rule  format  are  often  the  least 
restrictive — rewriting  rules,  for  example,  are  the  most  intractable  (undecid- 
able)  when  they  are  notationally  simplest  (unrestricted).  We  are  interested 
in  natural  restrictions  that  eliminate  unnatural  grammars  and  result  in  the 
most  efficient  formal  system,  not  ones  that  result  in  a  simpler  rule  format. 
As  Chomsky  (1965:61-2)  observed,  “the  critical  factor  in  the  development 
of  a  fully  adequate  theory  is  the  limitation  of  the  class  of  possible  grammars. 
.  .  .  we  should  like  to  accept  the  least  ’powerful’  theory  that  is  empirically 
adequate.” 

5.1  Overview 

The  RGPSG  process  of  assigning  structural  descriptions  to  utterances  differs 
slightly  from  the  GKPS  conception.  Figure  6  shows  the  internal  organization 
of  RGPSG  projection.  First,  metarules  and  marking  conventions  are  applied 
to  ID  rules,  resulting  in  an  enlarged  set  of  ID  rules  R'.  Then  the  rules  in 
R'  are  used  to  derive  the  utterances  in  the  language  of  the  RGPSG.  Unlike 
GPSG,  the  RGPSG  derivation  operation  includes  UFI  and  LP  because  both 
devices  are  informationally  encapsulated  and  functionally  independent.  The 
lack  of  FCRs,  FSDs,  and  exceptional  feature  specifications  means  that  ID 
rule  extension  is  monotonic,  unlike  in  GPSG:  every  legal  ID  rule  has  an 
easily-computed  legal  extension,  unlike  in  GPSG. 
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Figure  6:  This  diagram  shows  the  projection  of  an  RGPSG  G  with  ID  rules  R , 
metarules  M,  and  simple  defaults  S.  The  O-bounds  show  the  effect  of  various 
formal  devices  on  derived  grammar  symbol  size. 


5.2  Theory  of  Syntactic  Features 

The  set  K  of  RGPSG  syntactic  categories  is  specified  by  listing  a  set  Feat 
of  features,  a  set  Atom  of  atomic- valued  features,  a  set  A  of  atomic  feature 
values,  and  a  function  p  that  defines  the  range  of  each  atomic-valued  feature, 
as  in  GPSG.  The  major  change  is  is  unit  feature  closure  instead  of  finite 
feature  closure:  category- valued  features  may  only  contain  0- level  categories. 
(0-level  categories  do  not  contain  any  category-valued  features).  RGPSG 
adopts  this  strongly  falsifiable  constraint;  the  linguistic  justification  may  be 
found  above.  The  depth  of  category-embedding  is  purely  an  empirical  issue, 
and  hence  unit  closure  is  not  ad  hoc. 

The  other  revision  is  primarily  notational:  any  RGPSG  feature  /  may 
assume  the  distinguished  values  noBind  or  unBound  in  addition  to  those 
values  determined  by  p(f).  A  noBind  value  indicates  that  the  feature  may 
not  receive  a  value  in  any  extension  of  the  given  category,  while  unBound 
indicates  that  the  feature  does  not  currently  have  a  value,  and  may  receive 
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one  in  extension  (having  a  unBound  value  is  the  same  as  being  unspecified 
in  a  unification  grammar). 


5.3  Immediate  Dominance/Linear  Precedence 

According  to  the  simplest  version  of  x-theory,  all  phrases  must  have  heads. 
Although  GPSG  lacks  a  formal  constraint  to  this  effect,  in  point  of  fact 
every  ID  rule  in  the  GKPS  grammar  for  English  has  a  head.  For  these 
reasons,  RGPSG  ID  rules  must  have  exactly  one  mother  and  at  least  one 
head  daughter.  The  heads  are  separated  notationally  from  the  nonheads 
by  a  colon,  and  appear  to  the  left  of  the  colon.  The  mother  and  all  head 
daughters  are  implicitly  specified  for  [BULL  For  example,  the  RGPSG 
headed  ID  rule  6  corresponds  to  the  GPSG  ID  rule  7. 

V2  —  [SUBCAT  2]  : N2  (6) 

V2 [NULL  -]  — *  H [SUBCAT  2, BULL  -] ,  N2  (7) 

There  is  only  one  lexical  element  for  the  null  string,  and  it  is  universal  across 
all  grammars: 

X2  [SLASH  X2\  , HULL  +]x  -  e  (8) 

Co-subscripting  indicates  that  the  two  X2  categories  must  be  identical  in 
any  legal  projection  of  the  rule,  with  the  exception  of  the  [HULL  +]  and 
SLASH  specifications.  This  restricted  ID  rule  format,  when  coupled  with  a 
restriction  on  metarules  that  prevents  them  from  affecting  head  daughters, 
prevents  head  daughters  from  ever  being  erased  in  a  RGPSG  derivation. 
Thus,  null  transitions  are  effectively  eliminated  from  RGPSG. 

An  ordered  production  is  an  ID  rule  whose  daughters  are  completely 
linearly  ordered,  that  is,  a  string  of  daughter  categories  rather  than  multisets 
of  head  and  nonhead  daughters.  An  ordered  production  is  LP-acceptable  if 
<ill  LP  statements  in  the  RGPSG  are  true  of  it. 

The  RGPSG  ID/LP  formalism  by  itself  does  not  contain  formal  con¬ 
straints  sufficient  to  guarantee  polynomial- time  recognition,  although  the 
linguistically  justified  use  of  short  ID  rules  can  render  ID  rules  tractable, 
because  ID/LP  grammars  with  bounded  rules  can  be  parsed  in  time  poly¬ 
nomial  in  the  grammar  size.10 

10 If  the  length  bound  for  natural  language  grammars  is  the  constant  b,  then  any  ID/LP 
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5.4  Metarules 


An  RGPSG  metarule  may  only  affect  the  mother  and  at  most  one  nonhead 
daughter  in  a  lexical  ID  rules.  Thus,  the  only  way  a  metarule  can  affect  a 
head  daughter  is  to  introduce  a  new  head  feature  on  the  mother,  which  will 
appear  on  the  head  daughter  by  virtue  of  the  HFC  if  and  only  if  the  head 
daughter  is  unspecified  for  the  new  head  feature.  This  is  how  the  passive 
metarule  operates  in  RGPSG: 

VP  -  W,  NP 

(9) 

VPI+ PAS]  -  W,(PPl by]) 

The  complete  set  of  ID  rules  in  a  RGPSG  is  the  maximal  set  that  can  be 
arrived  at  by  taking  each  metarule  and  applying  it  to  the  set  of  rules  that 
did  not  themselves  arise  from  the  application  of  that  metarule  or  from  the 
application  of  one  or  more  other  metarules.  This  maximal  set  is  called  the 
biclosure  BC(M,R)  of  a  set  R  of  headed  lexical  ID  rules  under  a  set  M  of 
metarules. 

Recall  that  a  metarule  may  determine  more  than  one  ID  rule  per  input 
ID  rule  because  a  metarule  pattern  may  match  an  ID  rule  in  more  than 
one  way.  Given  a  set  of  ID  rules  R  whose  size  is  n  symbols,  and  given  a 
set  of  metarules  M  whose  size  is  m  symbols,  the  symbol-size  of  the  unit 
closure  UC(M,R )  is  8(n  +  n  •  m3)  =  0(|Gj3).  Each  symbol  in  M  can,  in 
the  worst  case,  match  each  symbol  in  R,  resulting  in  at  most  6{m)  new 
symbols  per  match.  Therefore,  the  symbol-size  of  the  biclosure  BC(M,R) 
is  0(n.m4)  =  0(|G|5). 

5.5  Principles  of  Universal  Feature  Instantiation 

Principles  of  universal  feature  instantiation  in  RGPSG  all  preserve  a  sim¬ 
ple  invariant  across  all  ID  rules.  They  are  monotonic;  that  is,  they  never 
delete  or  alter  existing  feature  specifications.  The  head  feature  convention, 

grammar  G  can  be  converted  into  a  strongly-equivalent  CFG  G\  of  size  9(\G\  -  b!)  = 
0(|G'|)  by  simply  expanding  out  the  constant  number  of  linear  precedence  possibilities. 
In  the  GKPS  and  RGPSG  grammars  for  English,  b  =  4  for  the  result  of  the  Subject- Aux- 
Inversion  metarule  applying  to  a  [SUBCiT  44]  rule  headed  by  the  auxiliary  be.  (I  ignore 
the  iterating  coordination  schema,  which  licenses  rules  with  unbounded  right-hand  sides.) 
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for  example,  strengthens  x-theory  by  ensuring  that  the  mother  agrees  ex¬ 
actly  with  all  head  feature  specifications  that  the  head  daughters  agree  on, 
regardless  of  where  the  specifications  come  from. 

Principles  of  UFI  govern  the  well-formedness  of  the  ID  rule  extension 
relation.  They  may  be  thought  of  as  first  applying  to  the  ID  rule  output 
of  metarule  biclosure  and  then  forever  afterwards  in  RGPSG  derivation.  As 
we  shall  see,  the  lack  of  exceptional  feature  specifications  in  RGPSG  means 
that  ID  rules  that  abrogate  the  constraints  of  UFI  cannot  be  written. 


Head  feature  convention.  The  head  feature  convention  enforces  the  in¬ 
variant  that  the  mother  is  in  absolute  agreement  with  all  head  features  on 
which  the  head  daughters  agree.  It  also  requires  the  BAR  value  on  a  head 
daughter  to  be  less  than  or  equal  to  the  BAR  value  on  the  mother.  HEAD 
contains  exactly  those  features  that  must  be  equivalent  on  the  mother  and 
head  daughters  of  every  ID  rule.11 

HEAD  =  { AGR,  ADV,  AUX,  INV,  LOC,  B,  KFORM,  PAS,  PAST, 

PER, PFORM, PLU, PRD, V,  VFORM} 


Control  agreement  principle.  The  control  agreement  principle  (CAP) 
differs  from  the  HFC  in  that  it  establishes  equivalences  (links)  between  the 
categories  in  an  ID  rule:  when  two  categories  are  linked  in  an  ID  rule,  the 
two  categories  must  be  identical  in  any  legal  extension  of  that  rule.  Links 
are  calculated  immediately  after  the  HFC  has  applied  to  the  ID  rules  for 
the  first  time;  once  a  link  is  established  in  an  ID  rule,  it  cannot  be  changed 
or  undone.12  The  first  part  of  the  CAP  calculates  control  relations  between 
categories,  while  the  second  part  of  the  CAP  establishes  links  using  the 
control  relations.  In  all  cases,  linking  is  indicated  by  co-subscripting. 

11  In  order  to  properly  account  for  feature  instantiation  in  the  binary  and  iterating 
coordination  schemata,  the  binary  head  ( BHEAD )  features  BAR,  SUBJ,  SUBCAT,  and  SLASH 
are  considered  to  be  head  features  for  the  purposes  of  the  HFC  in  all  nonlexical,  multiply- 
headed  ID  rules. 

l,In  GKPS,  only  head  feature  specifications  and  inherited  foot  feature  specifications 
determine  the  semantic  types  relevant  to  the  definition  of  control.  RGPSG  simplifies  this 
by  considering  inherited  feature  specifications  and  only  some  head  feature  specifications. 
Alternatively,  control  relations  could  be  calculated  every  time  the  HFC  instantiates  a 
feature  specification,  although  this  would  violate  monotonicity.  In  fact,  the  RGPSG  CAP 
uses  links  solely  to  preserve  monotonicity  of  feature  instantion  in  ID  rule  projection,  which 
makes  it  easier  to  understand  the  consequences  of  a  grammar. 
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CONTROL  =  {SLASH,  AGR} 

RGPSG  control  relations  are  calculated  as  follows.  A  predicate  is  a  VP 
or  an  instantiation  of  XP[+PRD]  such  as  a  predicate  nominal  or  adjective 
phrase.  The  control  feature  of  a  category  C,  where  C(BAR)  ^  0,  is  SLASH  if 
C  is  specified  for  SLASH;  otherwise,  it  is  AGR.  Control  is  calculated  once  and 
for  all  immediately  after  the  HFC  has  applied  to  the  ID  rules  resulting  from 
metarule  biclosure. 

Let  /  be  the  control  feature  of  a  category  C\.  Then  C\  is  controlled  by 
Ci  in  a  rule  if  and  only  if  C\(f)  =  C 2,  C2  3  X2,  and  either  the  rule  is 
Co  —*  C\  :  C2  (recall  that  C\  is  the  head  daughter),  or  the  rule  is  Co  — ' >  C3  : 
Ci,C2,  and  Co,Ci  3  VP. 

The  RGPSG  control  agreement  principle  states:  In  an  ID  rule 
Co  *  Ci, . .  • ,  Cj  .  Cj+ 1,  • .  • ,  Cn 

•  If  C,  controls  C*  and  /*  is  the  control  feature  of  C*,  then  Ck(fk)  and 
Ci  are  linked. 

•  If  there  is  a  nonhead  predicate  Cj  with  no  controller,  then  link  C,(/;) 
and  Co(/o),  where  /j  and  /o  are  the  control  features  of  C,  and  C0, 
respectively. 

In  the  theory  of  GKPS,  the  control  agreement  principle  performs  subject- 
verb  agreement  by  enforcing  a  control  relation  between  the  two  daughters 
of  the  rule 

S  —  Hf-SUBJ]  ,  X2 

In  RGPSG,  this  rule  must  be  stated  as 

S  —>  X2  [-SUB J, AGR  X2Z  :  X2 

if  we  wish  to  enforce  the  control  relation  between  the  two  daughters.  Be¬ 
cause  control  relations  in  RGPSG  are  static  (never  recalculated),  this  control 
relation  exists  even  if,  say,  X2  =  PP.  Fortunately,  verbs  will  only  be  spec¬ 
ified  for  legal  X2  values  in  the  lexicon,  and  therefore  any  “questionable” 
control  relations  involving  an  X2  other  than  NP  or  S  axe  ignored  at  the 
lexical  insertion  level. 
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Foot  feature  principle.  The  foot  feature  principle  (FFP)  requires  any 
foot  feature  specification  instantiated  on  a  daughter  category  to  also  be 
instantiated  on  the  mother.  The  value  assigned  to  an  instantiated  foot 
feature  is  identical  to  every  instantiation  of  the  same  foot  feature  on  other 
daughter  categories.  The  FFP  ensures  that  (1)  the  existence  of  inherited 
foot  features  on  any  category  of  an  ID  rule  blocks  instantiation  of  those 
foot  features  on  any  other  component  category  of  the  rule,  and  (2)  inherited 
foot  features  are  equivalent  across  Jill  component  categories  of  the  rule  (this 
second  condition  may  be  too  strong).  Both  conditions  are  designed  to  fix  an 
error  in  the  foot  feature  principle  (FFP)  of  GKPS,  which  permits  material 
to  be  topicalized  from  inside  a  topicalized  constituent.13 

FOOT  =  {RE,  SLASH,  WH} 

Because  the  empty  string  cam  be  dominated  only  by  a  category  of  the  form 
a  [HULL  +,  SLASH  a]  in  RGPSG,  the  FFP  tries  to  ensure  that  every  gap 
will  have  a  unique  filler.  Unfortunately,  it  is  impossible  to  truly  guarantee 
recoverability  of  deletions  in  RGPSG,  because  the  FFP  can  only  locally 
constrain  the  rule-to-tree  projection,  and  not  the  ID  rules  or  the  parse  trees 
themselves.  This  situation  is  unavoidable  in  the  GPSG  framework,  simply 
because  SLASH  does  not  always  mark  the  complete  path  between  a  gap  and 
its  filler  in  accepted  GPSG  analyses.  The  classic  example  is  the  GPSG 
analysis  of  subject  dependencies,  where  an  S/NP  is  reanalyzed  as  a  VP, 
effectively  deleting  an  NP gap  in  subject  position.  In  GKPS,  this  operation  is 
performed  by  slash  termination  metarule  2  (pp. 160-162):  [SLASH  NP]  only 
marks  the  path  from  the  filler  to  the  mother  of  the  reanalyzed  VP.  Another 
example  is  the  GKPS  (pp.  150-152)  analysis  of  missing-object  constructions 

UID  rule  10  introduces  topicalisation  constructions  in  English: 

5  — *  X,H/X  (10) 

The  control  agreement  principle  (CAP)  ensures  that  the  two  X categories  in  the  rule  agree 
with  each  other.  It  is  possible  to  instantiate  [SLASH  A]  on  the  S  mother  and  X  nonhead 
daughter  without  violating  the  GKPS  CAP  or  FFP  as  in  11,  provided  all  occurrences  of 
X  are  identical: 

S/X  -*  X/X,  H/X  (11) 

The  structure  11  satisfies  the  GKPS  CAP  because  the  ^-features  on  the  nonhead  daughter 
agree  with  the  slash  category  on  the  head  daughter;  the  GKPS  FFP  is  met  because  the 
foot  feature  specifications  instantiated  on  the  mother  are  also  instantiated  on  a  daughter. 
Revised  GPSG  prevents  such  extractions.  The  permissibility  of  the  local  tree  11  means 
that  every  topicalisation  structure  is  infinitely  ambiguous  in  GKPS,  because  the  X/X 
nonhead  daughter  can  be  terminated  with  t.  See  appendix  A. 3  for  more  details. 
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such  as  John  is  easy  to  please.  In  missing-object  constructions,  [SLASH  NP] 
only  marks  the  path  from  the  embedded  NP  object  gap  to  the  V2[INF] /NP 
dominating  to  please ,  failing  to  continue  through  the  AP  easy  to  please  to 
the  filler  John.  Many  sweeping  changes  would  be  necessary  before  the  FFP 
would  be  able  to  strictly  enforce  recoverability  of  deletions  in  RGPSG. 


Definition:  Free  Features 

A  feature  /  is  free  in  the  ID  rule  r  =  Co  — >  C\ , . . . ,  Cj  : 

Cj+ 1, . . . ,  Cn  iff  Vi,  0  <  i  <  n,  f  $  DOM(Ci). 

The  foot  feature  principle  states: 

1.  When  first  applying  to  an  ID  rule  r,  if  a  foot  feature  /  is  not  free 
in  r,  then  instantiate  [/  noBind]  on  all  categories  in  r  that  are  not 
specified  for  /. 

2.  When  SLASH  is  instantiated  on  the  mother,  instantiate  it  on  all  non- 
lexical  head  daughters.14 

3.  When  a  foot  feature  /  is  instantiated  on  a  daughter,  instantiate  it  on 
the  mother. 

14  This  condition  springs  from  the  necessity  of  accounting  for  certain  parasitic  gap  facts 
according  to  the  traditional  GPSG  analysis  of  clausal  structure.  The  problem  arises  in 
sentences  of  the  form 

Kim  wondered  [s  which  authors  [ s/NP  reviewers  of  always  detested  ^_]]  (12) 

where  the  parasitic  gap  is  introduced  by  a  binary  nonlexical  rule  13 

5  —  Xt  [-SUBJ  ,AGR  X2 ]  :  Xt  (13) 

rather  than  a  ternary  lexical  rule  like  other  parasitic  gaps.  Instantiating  SLASH  on  the  Xt 
nonhead  daughter  must  force  the  identical  SLASH  specification  on  the  mother  and  head 
daughter.  SLASH  isn’t  a  head  feature  in  RGPSG,  so  there  is  no  other  way  to  accomplish 
this.  A  possible  solution  is  to  replace  13  with  rules  to  introduce  clauses  with  and  without 
parasitic  gaps: 

S  -4  Xt  [-SOBJ.AGR  Xti  :  Xt 

S/NP  -  Xt  I-SDBJ.AGR  XXI /NP  :  ASl-fTOLL]  / NP 

We  would  then  need  to  ensure  that  AGR  was  transferred  from  the  head  daughter  to  the 
nonhead  daughter  by  the  CAP,  despite  the  presence  of  the  SLASH  feature. 
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4.  When  a  foot  feature  /  is  instantiated  on  a  mother,  instantiate  it  on 
one  or  more  nonhead  daughters.15 

5.0  Marking  Conventions 

The  sole  explicit  marking  convention  in  RGPSG  is  the  simple  default  (SD). 
Unlike  FCRs  and  FSDs,  SDs  are  constructive,  easy  to  understand  and  com¬ 
putationally  tractable.  Each  SD  is  applied  to  each  category  (and  may  be 
understood)  independent  of  all  other  categories  and  RGPSG  formal  devices. 
SDs  are  applied  in  order  to  ID  rules  immediately  after  the  initial  application 
of  principles  of  UFI. 

An  SD  contains  a  predicate  and  a  consequent.  The  consequent  is  a 
list  of  feature  specifications.  The  predicate  is  a  Boolean  combination  of 
truth- values  and  feature  specifications  such  that  if  a  category  C  bears  or 
extends  a  given  feature  specification,  that  feature  specification  is  true  of  C, 
else  false.  If  the  predicate  is  true  of  a  given  category  C  in  a  rule  and  the 
consequent  includes  only  unbound  and  unlinked  features,  then  the  feature 
specifications  listed  in  the  consequent  are  instantiated  on  C.  Each  SD  is 
applied  simultaneously  to  every  top-level  category  in  every  rule  exactly  once, 
in  the  order  specified  by  the  grammar.  Consider  the  following  SD: 

SD  1:  if  [SUBCAT]  then  [BAR  0] 

If  the  target  category  C  in  a  ID  rule  is  specified  for  the  SUBCAT  feature,  but 
unspecified  for  the  BAR  feature,  then  the  SD  will  force  the  feature  specifica¬ 
tion  [BAR  0]  on  C. 

Given  a  list  of  simple  defaults  whose  symbol  size  is  p,  and  given  a  set  of 
ID  rules  whose  symbol  size  is  n,  the  resultant  set  of  ID  rules  can  at  most 
contain  0(n  •  p)  symbols  (attained  if  every  SD  were  true  of  every  category 
and  every  category  consisted  of  a  lone  symbol). 

The  elimination  of  FCRs  does  not  appreciably  reduce  the  descriptive  el¬ 
egance  of  RGPSG  grammars:  see  BBR  for  an  RGPSG  describing  English, 
roughly  equivalent  in  symbol  count  and  descriptive  adequacy  to  the  English 
GPSG  provided  by  GKPS.  This  is  true  because  the  constraints  expressed 
by  FCRs  are,  for  the  most  part,  already  expressed  in  the  way  ID  rules  and 
metarules  are  written.  Conflicts  between  FCRs  and  existing  categories  typ¬ 
ically  either  indicate  an  error  on  the  grammar  writer’s  part,  overgeneration 

11  Note  that  this  condition  will  only  affect  the  foot  features  RE  and  VH,  and  never  SLASH. 
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of  possible  structures,  or  the  lack  of  inviolable  constraint  in  GPSG.  Th- 
conflicts  are  less  likely  to  occur  in  RGPSG  because  the  RGPSo  s  formal 
devices  are  more  constrained  than  those  in  GPSG.  More  generally,  although 
the  computational  weakening  of  the  formal  theory  has  decreased  the  descrip¬ 
tive  power  available  to  the  grammar  writer,  the  linguistic  strengthening  of 
the  formal  model  has  increased  the  descriptive  elegance  of  the  theory  as  a 
whole:  more  is  described  in  RGPSG  by  the  formal  theory  than  by  the  gram¬ 
mar  writer.  Simply  put,  it  is  harder  to  write  unnatural  grammars  in  RGPSG. 
But  if  the  elimination  of  FCRs  proves  unacceptable,  then  the  complex  sym¬ 
bol  rules  of  Chomsky  (1965:79-83)  may  be  used  to  specify  possible  syntactic 
categories.  Complex  symbol  rules  could  increase  descriptive  elegance,  lin¬ 
guistic  universalism,  and  empirical  coverage  without  causing  intractability. 
For  example,  complex  symbol  rules  can  make  the  NFORM,  PFORM,  and  VFORM 
features  universally  mutually  exclusive. 

5.7  Derivation  and  projection  in  RGPSG 

To  conclude,  we  must  determine  how  the  formed  subsystems  described  above 
fit  together,  beginning  by  formally  specifying  the  class  of  RGPSGs  and  the 
languages  they  generate.  A  subsequent  section  translates  the  GKPS  analy¬ 
sis  of  topicalization,  expletive  pronouns,  and  parasitic  gaps  to  the  RGPSG 
formal  system. 

The  set  of  ID  rules  R'  resulting  from  metarule  biclosure,  UFI,  and  SD 
application  generates  the  language  of  the  RGPSG  as  follows.  If  R'  contains  a 
rule  A  — *  7  with  an  extension  A!  — »  7'  that  satisfies  all  principles  of  UFI  and 
is  an  LP-acceptable  ordered  production,  then  for  any  string  of  terminals  a 
and  nonterminals  /3,  we  write  aA'(3  =*•  c*7'/3.  This  is  a  derivation^ ep.  The 
language  of  an  RGPSG  contains  all  terminal  strings  that  can  be  derived, 
using  the  ID  rules,  from  any  extension  of  the  distinguished  start  category. 
Let  be  the  reflexive  transitive  closure  of  =>.  Then  the  language  L(G ) 
generated  by  G  is 

L(G)  =  {x  |  z  €  Vr  and  3C  £  K[{C  □  Start)  A  C  ±  x}} 

5.8  Complexity  of  RGPSG  Recognition 

The  universal  recognition  problem  for  RGPSG  is  NP-compIete.  Recall 
that  a  problem  is  NP-complete  iff  it  is  NP-hard  and  in  MP.  Informally, 
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RGPSG-Recognition  is  in  MV  because  the  restricted  ID  rule  format  (no 
null-transitions)  ensures  a  polynomial  bound  on  the  length  of  the  shortest 
derivation.  In  the  worst  case  a  branching  ID  rule  can  be  converted  to  a  non- 
branching  ordered  local  tree  in  a  derivation  (this  happens  when  all  nonhead 
daughters  are  erased,  either  by  a  metarule  or  by  the  universal  ID  rule  8,  leav¬ 
ing  behind  at  least  one  head  daughter,  which  can  never  be  erased).  Once 
a  category  is  expanded  in  a  derivation,  it  must  be  lexically  realized  in  the 
derived  string.  Therefore  a  terminal  string  of  length  n  can  be  derived  with 
at  most  p  ■  n  productions  in  an  RGPSG  with  p  productions.  Appendix  B.l 
contains  a  formal  proof. 

Unfortunately,  it  is  difficult  to  use  our  conceptual  typology  of  computa¬ 
tion  trees  to  establish  the  NP-hardness  of  RGPSG  Recognition.  Although 
an  RGPSG  parse  tree  may  appear  structurally  equivalent  to  a  polynomial 
depth  pruned  OR  computation  tree,  we  must  be  careful.  As  in  the  reduction 
for  GPSG  Recognition,  RGPSG  categories  can  encode  nodes  in  the  compu¬ 
tation  tree,  and  ID  rules  can  represent  the  Turing  machine  OR  transitions. 
But  the  extension  relation  in  the  RGPSG  derivation  step  is  governed  by  the 
the  RGPSG  head  feature  convention,  which  ensures  that  all  heads  domi¬ 
nated  by  a  common  head  will  have  the  same  head  features.  This  means  we 
cannot  use  the  HFC  to  transfer  unaltered  tape  squares  from  a  configuration 
to  its  successors,  which  blocks  the  most  obvious  reduction.  The  trick  is  to 
encode  the  entire  pruned  computation  tree  in  an  RGPSG  category  and  use 
ID  rules  to  enforce  the  next-move  relation  between  subsets  of  the  category — 
see  the  proof  in  appendix  B.2.  The  actual  reduction  is  so  complicated,  we 
loose  the  conceptual  advantage  of  the  uniform  class  of  computation  trees. 
Terminal  ambiguity  and  nonlocal  agreement  (via  universal  feature  instanti¬ 
ation)  in  RGPSG  permit  a  considerably  simpler  reduction  to  Satisfiability, 
a  known  NP-complete  problem,  as  shown  in  appendix  B.l. 

The  restrictions  motivated  above  have  resulted  in  a  substantial  decrease 
in  complexity  from  the  EXP-POLY  time  hardness  of  GPSG-Recognition.  In 
fact,  only  two  sources  of  intractability  remain  in  RGPSG  —  lexical  ambigu¬ 
ity  and  nonlocal  feature  agreement  (compare  figures  4  and  7). 

This  decrease  in  complexity  is  significant  from  both  theoretical  and  prac¬ 
tical  perspectives.  First,  NP-complete  problems  typically  have  good  average 
time  algorithms  and  highly  efficient  near-optimal  solution  techniques,  while 
EXP-POLY  problems  do  not.  Next,  the  fastest  recognizer  known  for  GPSGs 
can  require  double-exponential  time  in  the  worst  case,  while  RGPSG  has  a 
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Syntactic  Categories 

<  Unit  feature  closure 

ID  Rules 

<  At  least  one  head  daughter 

<  Mother  &  heads  [HULL  -] 

<  Bounded  branching  (4) 

Metarules 

<  Cannot  directly  affect  heads 

<  Biclosure 

UFI 

<  No  exceptional  feature  specifications 

(UFI  preserves  simple  invariant) 

<  More  restricted  linguistically 

<  Monotonic 

Marking  Conventions 

<  No  disjunctive  consequences 

(Simple  Default) 

<  Cannot  interfere  with  UFI 

<  No  FCRs 

Figure  7:  RGPSG  Major  Changes 
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simple  exponential  time  recognizer.  Finally,  NP-complete  problems  have  ef¬ 
ficient  witnesses,  while  EXP-POLY  hard  problems  do  not.  This  means  that 
RGPSG  parses  can  always  be  verified  efficiently,  while  GPSG  parses  cannot, 
in  general. 
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6  Linguistic  Analysis  of  English  in  RGPSG 


This  section  reproduces  three  of  the  more  intricate  linguistic  analyses  of 
GKPS  in  order  to  illustrate  RGPSG’s  formalisms.  To  reproduce  their  com¬ 
prehensive  analysis  of  English  in  toto  would  be  a  disservice  to  that  work  and 
is  beyond  the  scope  of  this  paper.  Instead,  a  RGPSG  roughly  equivalent  in 
symbol  count  and  descriptive  adequacy  to  their  GPSG  for  English  may  be 
found  in  appendix  B  of  BBR;  the  reader  should  consult  GKPS  for  the  ac¬ 
companying  linguistic  exposition.  In  all  cases,  co- subscripting  indicates  the 
linking  performed  by  the  CAP. 

The  RGPSG  grammar  for  English  serves  to  demonstrate  the  empirical 
adequacy  of  the  restricted  RGPSG  formal  system.  RGPSG  is  empirically 
superior  to  GPSG  not  because  its  English  grammar  is  better,  but  because 
it  achieves  descriptive  adequacy  within  a  vastly  more  restricted  class  of 
grammars. 


0.1  Topicalization 


The  rule  14a  expands  clauses  and  rule  14b  introduces  unbounded  depen¬ 
dency  constructions  (UDCs)  in  English. 


a.  5  — +  X2  [SUBJ  AGR  XZ]  :  X2 

b.  S  —>  X2  [SUBJ  ♦  ,  SLASH  X2]  :  X2 


(14) 


In  both  cases  the  X2  nonhead  daughter  controls  the  head  daughter,  and  the 
control  agreement  principle  links  the  value  of  the  head  daughter’s  control 
feature  with  the  X2  daughter,  creating  the  ID  rules  in  15. 

a.  S  -»  VP  [AGR  X2i]  :  X2 , 

b.  5 [SLASH  noBind]  -»  5  [SLASH  X22 ]  :  X2  [SLASH  noBind]*  ' 


In  the  following  discussion,  [3s]  and  [3p]  abbreviate  [PER  3,-PLU]  and 
[PER  3,+PLU],  respectively.  Consider  the  topicalization  structure  in  fig¬ 
ure  8,  taken  from  GKPS,  p.  145. 

Note  that  it  is  impossible  to  extract  any  constituent  out  of  the  X2  daughter 
in  15b  because  the  foot  feature  principle  has  forced  [SLASH  noBind]  on 
the  X2  daughter  and  its  mother.  This  explains  the  unacceptability  of  16 
in  RGPSG,  which  is  permissible  in  the  theory  of  GKPS  (see  figure  9  in 
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5 


Sandy  NPl 3p]x  FP[+iVP[3p]  il /NPi3s] 


Figure  8:  This  is  a  typical  topicalization  structure  in  RGPSG.  Co-subscripted 
categories  are  linked  by  the  control  agreement  principle  (a  principle  of  universal 
feature  instantiation),  and  therefore  share  all  absent  feature  specifications.  The 
foot  feature  principle  and  the  CAP  combine  to  ensure  that  all  instances  of  the 
topicalized  category  (.AfP[3»])  agree.  A  dark  line  marks  the  extraction  path. 
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appendix  A. 3). 


*  Boston  [[  a  man  from  _  ]  [  we  want _ to  succeed  ]]  (16) 


0.2  Expletive  pronouns 


This  section  accounts  for  the  distribution  of  the  expletive  pronouns  it  and 
there  in  infinitival  constructions  on  the  basis  of  postulated  ID  rules  and  prin¬ 
ciples  of  universal  feature  instantiation  (see  GKPS,  pp. 115-121).  The  feature 
specification  [AGR  iVP[SF0RM  a]]  is  abbreviated  as  +a  below,  where  a  is 
it,  there,  or  NORM. 

The  RGPSG  for  English  includes  the  ID  rules  17, 

a.  S  —>  X2  [-SUB  J ,  AGR  X2]  :  X2 

b.  VP  —  [13]  :  VP  [INF] 

c.  VP  -  [16]  :  (PP[to]),  VP  [INF]  (17) 

d.  VP  -  [17]  :  AT,  VP  [INF] 

e.  FP[AGR  5]  —  [20]  :  NP 


the  simple  defaults  18, 

a.  SD  1:  if  [SUBCAT]  then  [BAR  0] 

b.  SD  2:  if  [+V,-H,-SUBJ]  then  [+NQRM]  '  ’ 

the  extraposition  metarule  19, 

X2  [AGR  5]  -h.  W 

K  (19) 

X2i* it]  -  W,  S 


and  the  lexical  entries  20.  All  other  nouns  are  specified  for  [HFORM  NORM] 
by  their  lexical  entries. 


(it,  AP[PR0,-PLU,NF0RM  it]) 
(there,  A^P  [PRO  ,NFQRM  there]) 


(20) 


From  the  ID  rules  in  17,  RGPSG  generates  the  following  ID  rules. 

а. VPi AGRi]  -»  Ft?[13,AGRi]  :  ^[INF.AGRj] 

б.  VP[AGRi]  —  VUllS.AGRi]  :  (PP[to] ),  VT[INF,AGRj] 
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The  absence  of  a  controlling  category  allows  the  CAP  to  link  the  AGR  values 
of  the  mother  and  VT[INF]  predicate  daughter.  The  HFC  then  links  the 
AGR  values  of  the  mother  and  lexical  head  daughter.  SD  1  specifies  the  head 
daughter  for  [BAR  0],  while  SD  2  cannot  affect  the  linked  AGR  values. 

VP  [AGR!  AT5 [NORM] ]  —  VO  [14,  AGR!  AT[N0RM]]  : 

V2  [INF,  AGRi  AT  [NORM]] 

The  CAP  and  HFC  operate  identically  as  in  21,  except  that  the  [+N0RM] 
specification  is  inherited  from  the  ID  rule  17b  and  propagated  through  the 
rule  by  the  CAP  and  HFC. 

FP[AGR2  AT[N0RM]]  —  mi7,AGR2  AT  [NORM]  ]  :  ,  , 

APi,miNF,  AGRi  AT]  } 

The  NP  daughter  controls  its  IT[INF]  sister,  and  the  CAP  links  the  AGR 
value  of  the  VP  to  its  sister  NP.  SD  2  specifies  the  mother  for  [+N0RM] .  and 
the  HFC  forces  this  specification  on  the  head  daughter. 

The  rules  in  23  introduce  [+it]  and  [+there]  specifications.  Note  that 
23a  is  the  result  of  the  extraposition  metarule  on  the  ID  rule  17e. 

a.  VP[+it]  -  [20]  :  ATP,  5 

b.  VT[+it]  -  [21]  :  (PP [to]), 5 [FIN]  (23) 

c.  VP  [AGR  AT[+th«r«,PLU  a]]  —  [22]  :  NP  [PLU  a] 

The  rules  in  23  may  only  expand  the  VP  daughters  of  the  ID  rules  21 
and  22  in  a  derivation  (compare  their  AGR  values).  Thus,  the  grammar  claims 
that  expletive  pronouns  only  occur  in  utterances  generated  using  the  rules 
in  23,  in  combination  with  the  “extending”  rules  21  and  22.  This  describes 
the  following  facts  from  GKPS,  p.  120. 16 


It 

‘There 

•Kim 


[  continues  [  to  bother  [  Lou  ]  [  that  Robin  was  chosen  ]]]  (24) 


[  *It  1 

<  There  >  [  appeared  (to  us)  [  to  be  [  nothing  in  the  park  ]]]  (25) 

(  *Kira  J 

"In  order  to  better  understand  these  examples,  associate  each  constituent  with  the  ID 
rule  that  generated  it.  To  help  with  this  task,  the  main  verbs  and  their  SXTBCiT  values  are: 
(continue,  13),  (appear ,  16),  (believe,  17),  (bother,  tO),  (be,  It). 
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Leslie  [  believed 


it  | 

•there  >  [  to  bother  [  us  ]  [  that  Lee  lied  ]]] 

♦Kim  J 


We  [  believed 


•it  \ 

there  >  [  to  be  [  no  flaws  in  the  argument  ]]] 

•Kim  J 


(26) 

(27) 


6.3  Parasitic  gaps 


Simple  parasitic  gaps,  that  is,  those  introduced  in  verb  phrases  by  lexical 
rules,  present  no  problem  for  RGPSG  because  the  FFP  demands  all  instan¬ 
tiations  of  SLASH  on  daughters  to  be  equal  to  each  other  and  equal  to  the 
SLASH  instantiation  on  the  mother. 


VP/NP 
VOL  13] 
NP/NP 
PPixol/NP 


(28) 


Kim  wondered  which  models 

[  had  sent  [  pictures  of  _  ]  [  to  __  ]] 
Sandy  ^  (  had  sent  [  pictures  of  _  ]  [  to  Bill  ]] 
[  had  sent  [  pictures  of  Bill  ]  [  to  _  ]] 


(29) 


The  FFP  insists  nonlexical  heads  be  instantiated  for  SLASH  if  any  nonhead 
daughter  is,  thereby  explaining  the  unacceptability  of  30  and  the  acceptabil¬ 
ity  of  31. 


а.  *  S/NP 

NP/NP 

VP  (30) 

б.  *  Kim  wondered  which  authors 

[[  reviewers  of  —  ]  [  always  detested  sushi  ]] 

o.  S/NP 

NP/NP 

VP/NP  (31) 

b.  Kim  wondered  which  authors 

[[  reviewers  of  ]  [  always  detested  __]] 
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This  analysis  of  parasitic  gaps  exactly  follows  the  one  presented  in  GKPS  on 
matters  of  fact.  These  facts  may  be  questionable,  however.  Some  sentences 
considered  acceptable  in  GKPS  (for  example,  Kim  wondered  which  authors 
reviewers  of  always  detested)  are  marginal  for  some  native  English  speakers. 
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7  A  Change  in  Perspective 


This  work  is  similar  to  that  of  Shieber  (1986)  in  its  attempt  to  recon¬ 
struct  GPSG  theory.  Shieber,  however,  is  concerned  solely  with  creating 
a  more  easily  implementable  and  understandable  description  of  GPSG  the¬ 
ory,  rather  than  with  changing  the  theory’s  generative  or  computational 
power. 

A  central  goal  of  mathematical  linguistics  is  to  precisely  determine  the 
power  of  a  linguistic  theory.  Traditionally,  formal  language  theory  (the 
Chomsky  hierarchy)  and  its  generative  power  analyses  have  translated  this 
question  into  the  narrower  question  of  how  unrestricted  the  rule  format  of 
a  theory  is.  We  have  seen  that  modem  computational  complexity  theory 
offers  another,  more  useful,  translation:  how  much  of  what  computational 
resources  does  a  theory  consume?  Complexity  theory  also  offers  a  new  per¬ 
spective  on  descriptive  adequacy.  Descriptive  adequacy,  as  commonly  under¬ 
stood,  refers  to  a  theory’s  ability  to  assign  the  same  structural  descriptions 
to  utterances  that  humans  do.  This  is  the  perspective  of  formal  language 
theory  and  E-language.  But  from  a  computational  complexity  perspective, 
descriptive  adequacy  refers  to  how  faithfully  the  internal  structure  of  a  lin¬ 
guistic  theory  —  its  representations  and  internal  operations  —  corresponds 
to  the  internal  structure  of  our  language  facility.  In  a  descriptively  adequate 
linguistic  theory,  the  structural  descriptions  and  computational  power  of  the 
theory  match  those  of  an  ideal  speaker-hearer. 
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A  Complexity  of  GPSG  reconsidered 


In  this  section  the  universal  recognition  problem  for  GPSGs  is  proved  for¬ 
mally  to  be  EXP-POLY  time-hard,  the  number  of  local  trees  in  the  GKPS 
grammar  for  English  is  underestimated,  and  the  consequences  of  GPSG’s 
theoretical  intractability  are  considered  from  the  perspective  of  a  grammar 
writer. 


A.l  GPSG  Recognition  is  EXP-POLY  time-hard 


Definition: 


A  fc-tape  alternating  Turing  machine  is  an  11-tuple:17 


M 


=  <  Q,E,r,$,#,fc,cr,g0,  Final, /7,£  > 


where 


Q,q0,  Final 

s,r 

$,# 

u 


E 


k 

Q' 

6 


Left 

Right 


set  of  states,  initial  state,  set  of  accepting  states 
input,  tape  alphabets,  E  C  P 
endmarker,  blank  symbol,  $,  #  €  T  -  E 
set  of  universal  states, 

U  C  Q,  U  disjoint  from  Final  and  E 
set  of  existential  states, 

E  C  Q ,  E  disjoint  from  Final  and  U 
number  of  read-write  tapes,  k  >  1 
UUE 

next-move  relation,  where 
6  C  (Q1  x  r*+1)  x{Q  xTk  x  {Left,Right}k+1) 
-1 
+  1 


The  ATM  has  a  read-only  input  tape,  with  the  input  w  G  E*  written  as 
and  the  reading  head  initialized  to  the  first  symbol  of  w.  The  k  work  tapes 
are  one-way  infinite  and  are  initially  blank.  A  configuration  of  the  ATM 
consists  of  the  state  together  with  the  head  positions  and  contents  of  the 
k  +  1  tapes.  A  move  of  the  ATM  consists  of  reading  one  symbol  from  the 
input  tape  and  moving  the  heads  left/right  as  allowed  by  6,  in  addition  to 

1TThi*  definition  is  based  on  Chandra  and  Stockmeyer  (1976).  We  have  taken  the  work 
tapes  to  be  one-way  infinite  instead  of  two-way  infinite,  in  addition  to  making  other  minor 
changes. 


60 


changing  the  state  of  the  machine.  The  directions  Left  and  Right  have  the 
numerical  values  +1  and  -1  for  convenience  in  proofs.  8  does  not  include  any 
transitions  from  accepting  states.  We  say  a  configuration  of  M  is  existential, 
universal,  or  accepting  if  the  state  of  the  TM  in  that  configuration  is;  in  this 
formalization,  an  accepting  configuration  does  not  need  any  special  tape 
contents,  but  only  an  accepting  machine  state. 

For  configurations  CofM,  let  the  sequence  Next\j(C )  =  (Co,  -  •  • ,  C*-i) 
enumerate  the  possible  successor  configurations  of  C  according  to  8.  k  is 
bounded  above  by  the  number  of  pairs  in  the  relation  8,  which  we  may  write 
as  |<5|. 

The  computation  of  an  alternating  TM  M  on  an  input  to  is  a  possibly 
infinite  tree  where  the  nodes  correspond  to  ATM  configurations,  that  is, 
an  AND/OR  computation  tree  whose  outdegree  is  |<5|.  Each  node  of  the 
computation  contains  a  machine  configuration  that  is  reachable  from  the 
configuration  above  it,  according  to  the  next-move  relation.  However,  to 
build  a  possibly  infinite  tree  the  nodes  must  be  made  mathematically  dis¬ 
tinct.  This  is  accomplished  by  defining  each  node  as  a  pair  ( x,C )  where  C 
is  the  machine  configuration  and  z  is  a  tree  position.  Technically,  the  tree 
position  is  a  string  of  numbers  that  identify  a  position  in  the  tree  by  listing 
which  branch  to  take  at  each  node;  the  numbers  are  all  between  0  and  |£|  -  1. 
The  root  position  is  the  empty  string,  so  the  root  node  of  the  tree  is  (e.Co) 
where  Co  is  the  initial  configuration.  The  daughters  of  any  node  (z,  C)  are 
given  by  NextNode\j(x,C )  where 

NextNode\f(x,C )  =  {(zi, Ci)  :  Next\j(C)  =  (...  ,C, ,...)}. 

The  concatenation  xi  identifies  a  unique  daughter  of  the  position  z  by  adding 
another  branch  number  at  the  end. 

The  criterion  for  acceptance  in  an  ATM  computation  tree  is  as  follows. 
Let  N  be  the  set  of  nodes  in  the  computation  tree  of  ATM  M  on  input  w. 
WTe  label  the  nodes  of  the  tree  either  true  or  false  as  follows.  A  labeling 
L  :  N  -*  {true,  false}  is  said  to  be  acceptable  if  the  labeling  of  each  node 
(z,C)  satisfies  the  following  conditions: 

1.  C  is  an  accepting  configuration  and  L(x,C)  =  true. 

2.  C  is  an  existential  configuration  and 

L(x,C)=  V  L(xi,C') 

(xi,C  ' )zNextNodeM(x.c ) 
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3.  C  is  a  universal  configuration  and 

I(*,C)=  f\  L(xuC')- 

(xi%C  ')^NextNodeM{i.C) 

To  simplify  matters,  we  also  require  that  NextNode\j(x,C)  be  nonempty 
in  this  case. 

By  definition,  V  of  an  empty  set  is  false.  M  is  defined  to  accept  the  input 
w  if  and  only  if  L(e,  Co)  =  true  for  all  acceptable  labelings  L.  Note  that 
ATMs  without  universal  states  operate  exactly  as  nondeterministic  TMs  do. 

Chandra-Kozen-Stockmeyer  (1976)  prove  that 

ASPACE(S(n))  =  (J  DTIME(cs <n>) 

c>  0 

where  A5PAC£'(5(n))  is  the  class  of  problems  solvable  in  space  5(n)  on  an 
ATM  and  DTIME{F{n ))  is  the  class  of  problems  solvable  in  time  F{n )  on 
a  deterministic  Turing  machine.  In  particular,  when  /(n)  is  the  class  of  all 
polynomial  functions,  the  formula  tells  us  that  polynomial  space  on  an  ATM 
is  equivalent  to  exponential-polynomial  time  on  a  deterministic  TM. 

The  following  proof  reduces  instances  of  polynomial  space-bounded  al¬ 
ternating  Turing  machines  to  instances  of  GPSG  Recognition. 

Theorem  0  GPSG -Recognition  is  EXP-POLY  time-hard. 

Proof.  By  direct  simulation  of  ATM  M  on  input  w.18  Let  M  be  a  1-tape 
ATM  with  polynomial  space  bound  5(n);  let  w  be  its  input.  Given  these 
reduction  inputs,  we  will  construct  a  GPSG  G  in  polynomial  time  such  that 
M  accepts  w  iff 


$0u>ilu;22  . . .  u;n(n)$(n  4-  1)  6  L(G). 

“Without  loss  of  generality,  we  use  a  1-tape  ATM,  so 

6  C  ( Q '  x  T  x  T)  x  (<?  x  T  x  {Left.  Right}  x  {Left,  Right}). 

Also,  in  the  reduction,  note  that  the  word  input  refers  to  three  completely  distinct  objects. 
The  ATM  input  itring  w  is  the  string  which  may  or  may  not  be  in  the  language  generated 
by  the  ATM.  The  GPSG  input  tirtng  x  is  the  string  which  may  or  may  not  be  in  the 
language  generated  by  the  GPSG;  x  and  w  are  never  the  same.  The  reduction  input  is 
the  problem  instance  (M,  w },  i.e.,  the  ATM  M  and  its  input  string  u’.  It  is  important  not 
to  confuse  the  three  distinct  uses  by  believing,  for  example,  that  the  GPSG  accepts  the 
same  language  as  the  ATM.  They  cannot  accept  the  same  language  in  principle. 
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By  Chandra-Kozen-Stockmeyer  (1976),  the  class  of  problems  solvable  in 
polynomial  space  S(n)  on  an  ATM  is  exactly  equivalent  to  the  class  of  prob¬ 
lems  solvable  in  exponential  polynomial  time  on  a  DTM.  Therefore,  given 
our  following  proof,  we  have  the  immediate  result  that  GPSG-Recognition 
is  DTIME(cs^)-h.&id,  for  all  constants  c,  or  EXP-POLY  time-hard. 

The  basic  plan  of  the  reduction  is  to  reproduce  a  pruned  computation 
tree  of  the  ATM  as  the  parse  tree  of  the  GPSG.  The  GPSG  will  assign 
this  elaborate  structure  to  the  empty  string  and  not  to  the  machine  input. 
However,  before  the  ATM  simulation  starts  there  will  be  some  auxiliary 
structure  that  copies  the  machine  input  w  into  the  features  that  represent 
the  ATM  input  tape.  The  actual  input  that  is  presented  to  the  grammar 
will  therefore  include  an  encoded  version  of  w  in  addition  to  the  ’’very  long 
empty  string”  over  which  the  computation  tree  is  built. 

Configurations  of  the  ATM  will  be  encoded  as  zero-level  syntactic  cate¬ 
gories.  Because  the  amount  of  tape  the  machine  can  use  is  bounded  by  the 
known  quantity  S(|u?|),  we  can  use  a  separate  feature  to  record  the  contents 
of  each  tape  square.  We  also  need  three  features  to  encode  the  ATM  head 
position  and  current  state.  In  a  polynomial-time  reduction,  we  are  limited  to 
specifying  a  polynomial  number  of  features  (for  tape  squares)  and  feature- 
values  (for  head  positions),  and  that  is  why  the  reduction  will  be  limited  to 
polynomial  space  bounded  ATM  computations  (S’(n)  a  polynomial). 

The  immediate  domination  (ID)  rules  of  the  GPSG  will  encode  the  6 
relation  of  the  ATM.  The  category  corresponding  to  any  configuration  C 
can  dominate  the  category  corresponding  to  C'  in  a  local  tree  iff  6  licenses 
the  transition  (C,C').  (The  exact  details  depend  on  whether  the  configu¬ 
ration  C  is  universal  or  existential.)  The  reduction  preserves  the  invariant 
that  a  nonterminal  in  the  grammar  can  be  terminated  iff  the  configuration 
that  it  represents  must  be  labeled  true  in  the  ATM  computation.  Con¬ 
sequently,  the  local  tree  for  a  universal  configuration  must  include  every 
successor  configuration  as  a  daughter.  In  contrast,  the  local  tree  for  an  ex¬ 
istential  configuration  must  merely  include  some  successor  configuration  as 
a  daughter.  Nonterminals  corresponding  to  halted,  accepting  configurations 
are  terminated  by  the  empty  string. 

Let  Next\f(C)  =  (Co, . . . ,  Cp).  If  C  is  a  universal  configuration,  we  would 
like  to  include  the  ID  rule  C  — ♦  Co, . . . ,  Cp;  if  C  is  an  existential  configura¬ 
tion,  we  would  like  to  include  k  +  1  rules  of  the  form  C  —  C,.  However,  we 
cannot  use  such  rules  directly  in  a  polynomial-time  reduction;  there  are  far 
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too  many  possible  configurations,  and  we  would  need  at  least  one  rule  for 
each.  Instead,  we  must  set  up  the  features  that  encode  the  configurations 
in  such  a  way  that  the  ID  rules  only  have  to  encode  the  relation  6,  which  is 
infinitely  smaller  than  the  Next^i •)  function.  Each  ^-transition  of  the  ATM 
licenses  infinitely  many  transitions  between  configurations  because  S  does 
not  care  about  the  tape  squares  in  a  configuration  that  the  machine  is  not 
currently  scanning.  In  the  same  way,  each  ID  rule  of  the  constructed  GPSG 
will  project  into  a  large  number  of  local  trees.  The  unchanged  portion  of 
a  tape  will  not  be  transferred  from  a  configuration  to  its  successor  by  the 
ID  rule,  but  will  be  transferred  by  the  head  feature  convention  (HFC,  a 
principle  of  universal  feature  instantiation).  All  features  that  represent  tape 
squares  are  declared  to  be  head  features  and  all  daughters  are  head  daugh¬ 
ters.  Consequently,  the  HFC  will  transfer  the  tape  contents  of  the  mother  to 
the  daughters  except  when  prevented  by  the  tape- writing  activity  specified 
by  the  next-move  relation. 

Proceeding  to  the  details  of  the  reduction,  the  following  features  are  used 
to  represent  M-configurations: 


STATE: 

INPUTPOS: 

VORKPOS: 

INPUT;: 

WORK;: 


the  state  of  the  machine 

the  head  position  of  the  read-only  input  tape 
the  head  position  of  the  read-write  work  tape 
the  contents  of  the  itk  square  of  the  input  tape 
the  contents  of  the  ith  square  of  the  work  tape 


In  addition,  the  feature  PHASE  will  be  used  to  separate  functionally  distinct 
regions  of  the  parse  tree.  [PHASE  READ]  categories  are  involved  in  read¬ 
ing  the  input  string,  [PHASE  RUN]  categories  participate  in  the  direct  ATM 
simulation,  and  the  [PHASE  START]  category  links  the  READ  and  RUN  phases. 

As  we  have  mentioned,  the  input  string  that  is  presented  to  the  GPSG 
has  the  form 

$0u?ilu’22  . . .  u;n(n)$(n  +  1) 

where  the  Wi  are  the  characters  of  the  machine  input,  the  “$”  characters  are 
endmarkers,  and  0, . . . ,  (n  +  1)  are  regarded  as  additional  characters.  We 
must  copy  $ti>$  onto  the  input  tape  of  the  simulated  machine.  For  every 
character  index  i,  1  <  i  <  |io|,  and  for  every  possible  character  a  €  E, 
include  the  following  lexical  rule  for  the  lexical  item  ax': 

(ax,  {[PHASE  READ]  ,  [INPUT,  a]}) 
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In  addition,  for  every  index  over  a  wider  range,  0  <  i  <  |ty|  +  1,  include  this 
lexical  rule  for  the  endmarker: 

<$i,{  [PHASE  READ]  ,  [INPUT;  $]}> 

Once  these  rules  are  constructed,  they  will  work  for  other  inputs  w'  of  length 
\w'\  <  \w\  as  well  as  for  w.  That  is  why  endmarkers  in  the  middle  of  w  have 
been  allowed  in  the  copying  rules. 

Together  with  the  specially  formatted  grammar  input,  these  rules  set  up 
the  input  tape  of  the  simulated  ATM.  We  must  also  initialize  the  features 
that  encode  the  work  tape  contents,  the  machine  state,  and  the  tape  head 
positions.  The  initialization  is  completed  by  defining  the  distinquished  start 
category  START  correctly: 

START  =  { [IHPUTPOS  1] , [VORKPOS  1] } 

U{  [STATE  go]  ,  [PHASE  START]} 

U{  [WORKy  #]  :  1  <  ;  <  S(M)} 

The  following  two  ID  rules  are  used  to  join  the  two  subtrees  together: 

START  -  {[PHASE  RUH] },{ [PHASE  READ]} 

{ [PHASE  READ] }  —  { [PHASE  READ] , [PHASE  READ] } 

(Here  all  daughters  are  head  daughters.)  The  [PHASE  READ]  rule  allows  the 
input-reading  portion  of  the  tree  to  branch  as  many  times  as  necessary  to 
cover  the  input  characters. 

In  our  formal  model  of  ATMs,  the  machine  halts  and  accepts  if  it  ever 
enters  an  accepting  state  g  6  Final.  Thus,  for  every  such  state  we  need  a 
null-transition  ID  rule  that  will  terminate  the  simulated  computation  tree. 
For  every  g  €  Final ,  the  following  ID  rule  should  be  included: 

{ [STATE  g] , [PHASE  RUH] }  —  e 

However,  the  most  important  ID  rules  are  still  to  come;  they  encode  the 
next-move  relation  6  of  the  machine.  Recall  that 

6  C  ( Q '  xrxr)x(<?xrx  {Left,  Right}  x  {Left,  Right}) 

is  a  relation  between  tuples 

(state,  input  tape  symbol,  work  tape  symbol) 
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on  the  one  hand  and  tuples 

(new  state, new  work  tape  symbol, 

input  head  movement,  work  head  movement) 

on  the  other  hand.  We  describe  S  as 

6(q,a,b)  =  {{q1 ,  b' ,  d; ,  dw)  : 

(( q,a,b),{q',b',di,dw ))  €  6}. 

With  this  notation,  we  may  specify  the  ID  rules  that  encode  6.  A  set  of 
rules  will  be  specified  for  every  state  q  €  Q'  and  all  tape  symbols  a  and  b, 
thus  covering  all  of  6.  No  rules  are  to  be  constructed  when  6(q,a,b)  =  0. 
For  q  6  Q'  with  6(q,a,b)  ^  0  there  are  two  cases  depending  on  whether  q 
is  existential  or  universal.  In  either  case,  the  construction  must  be  carried 
out  for  all  possible  input-head  positions  i  (0  <  i  <  |u>|  -f  1)  and  work-head 
positions  ;  (1  <  ;  <  S(|u>|)): 

1.  If  q  is  in  E  (an  existential  state),  include  an  instance  of  the  following 
ID  rule  for  every  ( q',b\di,dw )  €  6(q,a,b): 

{  [INPUTPOS  t]  ,  [INPUT;  a] , 

[V0RKP0S  ;],  [WORK;,  6], 

[STATE  g] , [PHASE  RUN] }  — 

{ [INPUTPOS  :  +  d/]  ,  [INPUT;  a]  , 

[V0RKP0S  j  +  dwl,  [WORK;  6'], 

[STATE  q'l ,  [PHASE  RUN]  } 

Each  of  these  rules  propagates  the  value  on  the  input  tape,  changes  the 
value  on  the  work  tape,  moves  the  heads,  and  changes  the  automaton 
state;  note  that  all  have  the  same  left-hand  side.  Because  several  such 
rules  are  included,  only  one  daughter  computation  has  to  succeed.  The 
lone  daughter  in  each  such  rule  is  a  head  daughter. 

2.  If  q  was  not  in  E,  it  must  be  in  U  instead  (a  universal  state).  For 
this  case,  let  L  — ►  Ri, . . . ,  L  — ►  Rj,  be  the  rules  that  would  have  been 
constructed  according  to  case  (a)  if  q  had  been  an  existential  state. 
Then  include  the  rule 

L  — >  Ri, , Rp 

instead  of  those  rules.  Again,  every  daughter  is  a  head  daughter. 
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With  these  rules,  the  construction  is  almost  finished;  only  a  few  loose 
ends  remain.  The  syntactic  categories  used  in  our  GPSG  are  formally  spec¬ 
ified  as  follows: 

Feat  =  {STATE, ISPUTPOS, WORKPOS, PHASE} 

U{lBPUT{  :  0  <  t  <  \w\  +  1} 

U{W0RKj  :  1  <  j  <  S(M)} 

Atom  =  Feat 

'  Q,  if  /  =  STATE 

the  set  {t :  0  <  i  <  M  +  1},  if  /  =  INPUTPOS 
_  I  the  set  {j  :>  <j  <  S(M)},  if /  =  WORKPOS 
P^'  ~  £U{$},  if/  =  INPUT,  for  some  i 

the  ATM  tape  alphabet  T,  if  /  =  WORKj  for  some  j 
.  the  set  {START,  READ,  RUH},  if  /  =  PHASE 

The  set  of  head  features  is  defined  to  consist  of  the  IHPUT,  features  and 
the  WORKj  features.  In  addition,  we  need  feature  co-occurrence  restrictions 
to  ensure  full  specification  of  all  non-null  categories.  For  every  /  6  Atom, 
include  the  FCR  [STATE]  D  [/] . 

Inspection  of  the  construction  steps  shows  that  the  reduction  may  be 
performed  in  polynomial  time  in  the  size  of  the  simulated  ATM.  (Note  that 
the  grammar  we  construct  encodes  only  the  description  of  the  machine  that 
produces  the  computation  tree — not  the  potentially  infinite  computation 
tree  itself.) 

No  metarules  or  LP  statements  are  needed,  although  metarules  could 
have  been  used  instead  of  the  head  feature  convention.  Both  devices  are 
capable  of  transferring  the  contents  of  the  ATM  tape  from  the  mother  to 
the  daughter(s).  One  metarule  would  be  needed  for  each  tape  square/tape 
symbol  combination  in  the  ATM. 

GKPS  definition  5.14  of  admissibility  (p.104)  guarantees  that  admissi¬ 
ble  trees  must  be  terminated.  By  the  construction  above,  a  [PHASE  RUH] 
node  can  be  terminated  only  if  it  represents  an  accepting  configuration.  In 
particular,  a  [PHASE  RUH]  node  cannot  be  terminated  by  a  lexical  rule,  be¬ 
cause  all  constructed  lexical  rules  are  [PHASE  READ] .  This  means  the  only 
admissible  trees  are  accepting  ones  whose  yield  is  the  input  string  followed 
by  a  very  long  empty  string.  Q 
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A. 2  Number  of  CF  English  Productions 

For  GPSG  and  all  its  variants,  the  only  grammar  directly  usable  by  the  Ear- 
ley  algorithm,  that  is,  with  the  same  complexity  as  a  context-free  grammar, 
is  the  set  of  admissible  local  trees.  I  estimate  the  number  of  local  trees  in 
the  following  “typical”  RGPSG  for  English  and  show,  in  accordance  with 
earlier  estimates  (see  Shieber  1Q83: 137 ),  that  this  set  is  astronomical.  Any 
recognition  procedure  that  explicitly  calculates  or  uses  the  set  of  admissible 
local  trees  can  only  result  in  a  slower  recognition  time  than  one  that  does 
not. 

Consider  the  simplest  ID  rule  32  in  the  RGPSG  for  English. 

VP  — [1]:  (32) 

The  VP  mother  may  receive  multiple  values  (or  remain  unspecified)  for 
the  atomic- valued  features  CASE,  GER,  NEC,  POSS,  REHOR,  WHHOR,  AUX, 
INV,  LOC,  PAST,  PER,  PLU,  PRD,  or  VFORM.  Assume  that  each  feature  is 
binary.  Then  314  possible  extensions  of  the  VP  are  licensed,  since  each  fea¬ 
ture  may  be  +,  -,  or  unspecified.  VP  may  also  receive  many  AGR  specifica¬ 
tions  in  which  the  atomic-valued  features  CASE,  COMP,  GER,  NEG,  qFORM, 
POSS,  REMOR,  VHMOR.  ADV,  AUX,  IBV,  LOC,  H.  PAST,  PER,  PLU,  SUBJ, 
V  may  receive  multiple  values  or  be  undefined.  The  daughter’s  feature  val¬ 
ues  are  fixed  by  the  lexicon  and  the  HFC,  so  the  ID  rule  32  corresponds  to 
314.319  =  333  unanalyzable  context-free  productions.  The  GPSG  equivalent 
of  32  corresponds  to  significantly  more  context-free  productions  due  to  the 
combinatorial  possibilities  of  embedded  categories  in  GPSG. 

The  ID  rule  33  is  slightly  more  complicated. 

VP  -  [2]  :  NP  (33) 

The  VP  mother  in  33  may  bear  all  of  the  features  of  the  VP  mother  in  the 
rule  32,  plus  it  may  also  bear  the  category-valued  features  SLASH  or  WH,  or 
RE,  because  these  foot  features  can  be  instantiated  on  the  NP  daughter.  ID 
rule  33  therefore  corresponds  to  approximately  314  •  (319)3  =  371  >  1033 
unanalyzable  context-free  productions.  We  would  expect  that  only  some  of 
these  1033  context-free  productions  are  really  legitimate  rules  of  an  English 
RGPSG.  Even  if  we  were  able  to  exclude  the  invalid  extensions  from  consid¬ 
eration,  the  RGPSG  for  English  would  still  contain  an  astronomical  number 
of  context-free  productions,  and  the  GPSG  for  English  still  more. 
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Significantly  underspecified  ID  rules  such  as  the  binary  coordination 
schema  correspond  to  an  even  greater  number  of  context-free  productions. 
In  the  following  estimate,  I  count  the  three  mutu&lly-exclusive  atomic  head 
features  IFORM,  PFORM,  VFORM  as  one  feature,  and  ignore  the  features  HULL, 
COIJ,  COMP  since  their  distribution  is  extremely  limited.  I  must  also  ignore 
the  positive  Kleene  star  categories  of  the  iterating  coordination  schema,  be¬ 
cause  any  ID  rule  containing  them  corresponds  to  an  infinite  number  of 
context-free  productions. 

The  12  atomic  head  features  can  receive  any  value  on  either  head  daugh¬ 
ter  (=  (312)2)  and  the  9  non-head  atomic  features  can  receive  any  value  on 
any  of  the  three  categories  in  the  rule  (=  (39)3).  Because  the  foot  features 
WH  and  SLASH  are  mutually  exclusive,  there  are  effectively  3  category- valued 
features:  the  head  feature  AGR  may  take  312+9  values  on  either  head  daugh¬ 
ter  (=  (321)2),  while  the  two  foot  feature  may  each  take  321  possible  values 
on  only  one  category  (=  (321)2).  Thus,  the  BCS  corresponds  to 

(312)2  •  (39)3  •  (321)2  •  (321)2  =  3135  >  1064 

context-free  rules.  In  short,  even  the  more  constrained  RGPSG  framework 
licenses  an  astronomical  number  of  context-free  productions. 

A. 3  Practical  Consequences  of  GPSG’s  Complexity 

Here  I  argue  that  the  GPSG  theory  of  GKPS  is  difficult  to  understand  and 
prone  to  massive  overgeneration.  The  exclusive  use  of  extensional  (i.e.  non¬ 
constructive)  definitions,  when  coupled  with  the  vast  array  of  extensional 
possibilities  and  incompletely  specified  relationships  among  formal  devices 
means  that  knowledge  of  principles  does  not  translate  into  an  ability  to 
determine  the  consequences  of  that  knowledge.  It  is  difficult,  if  not  impos¬ 
sible,  for  the  linguist  to  understand  the  consequences  of  a  particular  GPSG 
system,  in  part  because  the  formal  system  is  so  intractable. 

Not  only  are  there  an  extra-astronomical  number  of  syntactic  categories 
in  GPSG,  but  the  computations  performed  on  them  are  extremely  intri¬ 
cate.  Head  features  sue,  in  principle,  those  features  that  must  agree  on  the 
mother  and  head  daughters  in  a  local  tree.  Consider  the  head  features  SUB  J , 
SUBCAT,  SLASH,  and  BAR,  which  are  actually  required  to  disagree  in  nearly 
all  cases.  SUBCAT,  for  example,  must  never  be  equivalently  specified  on  the 
mother  and  head  daughters,  except  when  it  remains  unspecified  on  both. 
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ID  rules  with  lexical  heads  predominate,  and  in  those  rules  BAR,  SUBJ, 
and  SLASH  features  must  never  agree.  The  GKPS  solution  to  this  problem 
is  to  make  feature  instantiation  operate  only  on  those  (non-problematic) 
features  that  may  be  freely  equated:  absolute  feature  specification  iden¬ 
tity  is  not  possible  to  enforce,  either  “because  the  ‘problematic’  feature 
specification  is  stipulated  in  the  rule,  or  because  its  presence  or  absence  is 
required  by  the  FCRs,  or  because  its  presence  or  absence  is  required  by 
the  FFP  or  CAP.”(p.95)  In  short,  principles  of  universal  feature  instantia¬ 
tion  can  only  be  understood  relative  to  all  other  formal  devices.  The  dy¬ 
namic  nature  of  feature  instantiation,  when  compounded  with  extensional 
(i.e.  non-constructive)  definitions  of  the  foot  feature  principle  (FFP),  con¬ 
trol  agreement  principle  (CAP),  and  head  feature  convention  (HFC),  results 
in  a  linguistic  theory  that  is  extremely  difficult  to  understand. 

As  shown  above,  the  ‘problematic  feature  specification’  solution  intro¬ 
duces  significant  additional  complexity  in  the  theory  of  syntactic  categories: 
universal  feature  instantiation  plays  a  central  role  in  the  EXP-POLY  time- 
hard  reduction.  The  additional  complexity  is  concealed  by  extensional  defi¬ 
nitions  —  admissible  local  trees  are  defined  in  terms  of  all  projections  of  an 
ID  rule  that  meet  the  FFP,  CAP,  and  HFC.  The  interactions  of  these  three 
grammatical  components  remain  unspecified,  and  the  actual  role  of  FCRs 
is  similarly  never  explained.  If  FCRs  apply  after  the  HFC,  all  lexical  heads 
would  first  be  specified  for  [BAR  2]  and  then  eliminated;  if  FCRs  apply 
before  the  HFC,  all  lexical  heads  would  be  specified  for  [BAR  2]  and  then 
admitted.  The  GKPS  solution  to  this  dilemma  is  to  apply  FCRs  and  the 
HFC  simultaneously  to  the  ID  rule  projections,  along  with  the  FSDs  and 
other  principles  of  universal  feature  instantiation. 

One  might  think  from  these  extensional  definitions  that  the  FFP,  CAP, 
HFC,  FCRs,  and  FSDs  apply  relatively  independently.  Even  worse,  one 
might  associate  order  of  presentation  (FFP,  then  CAP,  then  HFC,  and  finally 
FSDs)  with  formal  dependence.  In  fact,  their  interactions  are  intricate  and 
highly  structured.  For  example,  the  CAP  depends  heavily  on  the  HFC 
because  head  feature  specifications  in  part  determine  the  semantic  types 
relevant  to  the  definition  of  control.  One  of  the  goals  of  Revised  GPSG,  as 
presented  here,  is  to  unravel  these  dependencies  and  expose  the  underlying 
structure  of  feature  instantiation  in  natural  language  grammars. 

A  major  caveat  regarding  the  following  examples  is  in  order.  The  major 
empirical  support  for  the  claim  that  the  interactions  among  GPSG’s  formal 
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devices  are  unpredictable  is  the  theory’s  computational  intractability,  as 
shown  above  in  section  4.  I  in  no  way  consider  errors  in  GKPS  to  be  a 
major  source  of  empirical  evidence  for  this  claim.  The  errors  are  anecdotal, 
circumstantial,  and  probably  unavoidable  in  any  system  as  complex  as  a 
descriptively  adequate  linguistic  theory  —  at  best  these  errors  constitute 
auxiliary  support  for  this  claim,  and  for  this  reason  appear  in  the  appendix. 

These  interactions  manifest  themselves  in  extremely  subtle  ways.  A 
case  in  point  is  the  GKPS  analysis  of  unbounded  dependency  constructions 
(UDC’s).  These  constructions  are  said  to  have  a  top,  middle,  and  bottom. 
The  top  of  topicalization  and  wh-movement  constructions  is  supplied  by  the 
ID  rule  34, 

S  —  X2,  H /  X2  (34) 

which  is  repeatedly  assumed  in  GKPS — throughout  chapter  5  and  in  the 
elucidation  of  the  CAP,  e.g.  local  tree  25c  on  p.90 — to  result  in  local  trees 
of  the  form 

5 

ATCPER  3,  -PLU] 

5[SLASH  NPl PER  3,  -PLU]] 


where  the  CAP  requires  the  value  of  SLASH  on  the  head  to  be  identical  to 
the  controller  X2  (that  is,  the  two  X2  categories  of  the  ID  rule  34  must 
agree).  This  local  tree  is  an  egregious  violation  of  the  HFC,  because  the 
non-problematic  head  feature  SLASH  is  present  on  the  head  daughter  but 
not  the  mother. 

A  tree  compatible  with  the  HFC,  CAP,  and  FFP  follows  immediately. 
Note  that  this  is  the  most  general  tree,  because  the  FFP  requires  identical 
instantiation  of  SLASH  values  on  the  mother  and  daughters. 

5CSLASH  NPl? ER  3,  -PLU]] 

NP [PER  3,  -PLU,  SLASH  NP [PER  3,  -PLU]] 

SCSLASH  NP[?ZK  3,  -PLU]] 


An  immediate  consequence  of  the  HFC,  then,  is  that  UDC  constructions 
are  topless,  consisting  of  am  infinite  iteration  of  NP’s  missing  NP’s  internally 
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(FSD  3  prevents  +HULL  from  being  instantiated  on  the  NP/NP  daughter). 
A  UDC  cannot  terminate  without  introducing  another  UDC.  The  obvious 
solution  is  to  remove  SLASH  from  the  set  of  head  features  —  it  was  unclear 
why  SLASH  should  be  a  head  feature  from  the  start,  since  most  heads  are  pre¬ 
vented  from  being  specified  with  SLASH  by  FCR  6  (and  by  FSD  3,  indirectly 
through  FCR  19).  The  strongest  argument  for  including  SLASH  in  HEAD 
arises  from  the  GKPS  account  of  parasitic  gap  facts.  Removing  SLASH  from 
HEAD  also  simplifies  the  definition  of  the  control  relation  to  only  consider 
HEAD  and  inherited  FOOT  feature  specifications,  rather  than  non-FOOT 
HEAD  features  and  inherited  FOOT  features,  when  determining  the  seman¬ 
tic  types  relevant  to  the  definition  of  control  (see  GKPS  p.87). 

Unfortunately,  our  troubles  are  not  over  yet.  It  is  perfectly  admissible 
to  instantiate  a  SLASH  feature  on  the  mother,  provided  we  satisfy  the  FFP 
and  instantiate  it  on  at  least  one  daughter.  Since  SLASH  cannot  be  instan¬ 
tiated  twice  on  the  daughter  head,  it  must  be  instantiated  identically  on  the 
daughter  NP. 

STSLASH  N PIPER  3,  -PLU]] 

JVT[PER  3,  -PLU.  SLASH  AT[PER  3,  -PLU]] 

5 [SLASH  NPfPER  3,  -PLUJ] 


The  CAP  is  satisfied:  the  value  of  the  SLASH  feature  on  the  agreement  target 
(i.e.  the  head  daughter)  is  identical  to  the  .^specifications  of  the  controller 
unified  with  the  value  of  the  SLASH  feature  on  the  controller.  (In  plainer 
English,  the  ^-specifications  are  the  head  but  not  foot  specifications,  plus 
any  inherited  foot  specifications.  In  this  instance,  the  ^specifications  of  the 
controller  are  the  features  {  [+H.-V, BAR  2.PER  3, -PLU]}.) 

In  conclusion,  the  only  permissible  local  tree  in  the  entire  preceding 
discussion  is: 

SCSLASH  AT[PER  3,  -PLU]] 

JVPCPER  3,  -PLU,  SLASH  1VP[PER  3,  -PLU]] 

5CSLASH  AT[PER  3,  -PLU]] 


Therefore,  even  if  we  remove  SLASH  from  the  set  of  head  features,  the  GKPS 
grammar  for  English  will  allow  an  infinite  class  of  ungrammatical  utterances, 
such  as  that  shown  in  figure  9. 
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5 


Figure  9:  This  is  a  problematic  topicalization  structure  in  GPSG,  where 
[SLASH  AP[3»]]  has  been  instantiated  on  both  the  mother  S  and  its  jVP[3»] 
nonhead  daughter,  in  accordance  with  the  FFP.  The  bad  extraction  is  marked  by 
a  dark  line. 


Other  examples  of  unexpected  interactions  among  the  principles  of  uni¬ 
versal  feature  instantiation  can  be  found  in  GKPS.  An  example  is  the  def¬ 
inition  of  the  FFP  given  in  GKPS.  Consider  the  ED  rules  generated  by  the 
STM2  metarule,  which  are  of  the  form: 

A/B  -  C,D 

Local  trees  of  the  class 

A/B 

C/F 

D/G 
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meet  the  FFP  because  the  unification  of  instantiated  foot  features  on  the 
daughter  categories  is  undefined  for  SLASH  (the  unification  of  F  and  G  is 
undefined),  and  4>{A/ B)  |  FOOT  A’B  is  also  undefined  because  FOOT 
A/B  is  the  empty  set. 

The  RGPSG  FFP  fixes  this  problem  as  follows.  Any  foot  feature  spec¬ 
ification  that  is  instantiated  on  a  daughter  category  in  an  RGPSG  local 
tree  must  also  be  instantiated  in  the  mother  category,  and  that  specification 
must  be  identical  to  an  instantiation  of  the  same  feature  on  other  daughter 
categories.  This  revised  FFP  also  ensures  that  inherited  foot  features  on  the 
mother  prevent  an  instantiation  of  those  foot  features  on  any  daughters. 
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B  Complexity  of  RGPSG  Recognition 


This  appendix  contains  a  formal  proof  that  the  universal  recognition  problem 
for  R-GPSG’s  is  NP-complete.  That  is,  the  problem  of  determining  for  an 
arbitrary  RGPSG  G  and  input  string  w  whether  w  is  in  the  language  L(G) 
generated  by  G,  is  NP-complete. 

B.l  RGPSG  Recognition  is  NP-complete 

Let  P  be  the  set  of  ID  rules  resulting  from  applying  metarule  biclosure,  UFI, 
and  simple  defaults  to  the  set  of  ID  rules  given  in  the  RGPSG  G.  Recall 
that  P  contains  0(|G|5)  symbols. 

Lemma  B.l  Let  (ip0, . . . , <£>),)  be  a  shortest  leftmost  derivation  of  ^  from 
<£o  in  an  RGPSG  G  containing  at  least  one  branching  production.19  If  k  > 
|P|,  then  \ipk\  >  | f^ol- 

Proof.  In  the  derivation  step  =>  <fii+ 1,  where  pi  =  a  A' (3  and  <o<+i  =  cry' (3 
for  a  £  V^,  (3  €  (VT  U  K  )*,  one  of  the  following  cases  must  hold: 

1.  The  production  A  — ♦  7  with  extension  A'  — < ■  7'  is  nonbranching 
(I7I  =  1).  In  the  worst  case,  we  could  cycle  through  every  possible 
nonbranching  production  (without  using  a  branching  production),  af¬ 
ter  which  we  would  begin  to  reuse  them.  Any  extension  of  a  production 
that  has  already  been  used  in  this  run  of  nonbranching  productions 
could  have  been  guessed  previously,  and  the  length  of  the  shortest 
nonbranching  run  must  be  less  than  |P|. 

2.  The  production  A  — ♦  7  with  extension  A'  -*  7'  is  branching  (ItI  >  !)• 
Then  \<?i\  >  |^+i|- 

A  total  of  at  most  n  - 1  branching  productions  derives  an  utterance  of  length 
n,  because  there  are  no  null-transitions  in  an  RGPSG.20  Each  branching 

l*If  the  RGPSG  G  does  not  contain  a  branching  production,  then  L(G)  contains  only 
strings  of  length  one  and  all  shortest  derivations  are  shorter  than  |P|:  membership  for 
such  a  grammar  is  clearly  in  SlV . 

,0The  only  null-transition  is  the  lexical  element  for  the  category  A'J[+HTJU.]i/Xfi ,  as 
shown  in  8.  Null-transitions  may,  at  the  worst,  convert  the  extension  of  a  branching 
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production  can  be  separated  from  the  closest  other  branching  production  in 
the  derivation  by  a  run  of  at  most  |P|  nonbranching  productions,  and  the 
shortest  derivation  of  x  will  be  of  length  6(\P\  •  |*|)  =  0{\G\S  •  jzj).  |  | 

Theorem  7  RGPSG  Recognition  is  in  MV . 

Proof.  On  input  RGPSG  G  and  input  string  x  e  Vj  *,  guess  a  derivation 
of  x  in  nondeterministic  polynomial  time  as  follows.21 

1.  Compute  the  set  P  of  ID  rules  resulting  from  applying  the  simple  de¬ 
faults  (while  respecting  the  principles  of  UFI)  to  the  output  of  metarule 
biclosure  BC(M,R).  This  can  be  done  in  deterministic  polynomial  time 
(see  above). 

2.  Guess  an  extension  S'  of  the  start  category  S,  and  let  S'  be  the  deriva¬ 
tion  string. 

3.  For  a  derivation  string  a  A'/ 3,  where  a  6  V£,/3  G  (I'r  U  A')*,  guess  a 
production  A  — *  7  and  extension  A'  — >  7'  of  it.  Let  07'/?  be  the  new 
derivation  string. 

4.  If  OL~f'/3  =  *,  accept. 

5.  If  \cx^'S\  >  jar  I,  reject. 

6.  Loop  to  step  3  (at  most  |Pj  •  jar)  times). 

Every  loop  of  the  nondeterministic  algorithm  performs  one  step  in  the 
derivation.  By  lemma  B.l,  the  shortest  derivation  of  x  is  at  most  of  length 
9{\P\  •  |*|),  so  we  need  to  loop  through  the  algorithm  at  most  that  many 
times.  Guessing  an  extension  of  a  category  may  be  performed  in  time 
0(| Cat |  •  |Atomj),  and  an  extension  of  a  production  may  be  guessed  in  time 
0(jCat|  •  |Atomj  •  jP[).  This  nondeterministic  algorithm  runs  in  polynomial 
time  and  accepts  exactly  L(G);  hence  RGPSG  Recognition  is  in  AfV.  I~~l 

The  idea  of  the  following  hardness  proof  arose  during  a  discussion  with 
Ed  Barton  and  Robert  Berwick. 

production  to  a  nonbranching  one  in  the  derivation  because  no  head  daughter  may  bear 
the  [+NVLL]  specification,  and  therefore  null-transitions  may  in  effect  be  compiled  out  of 
the  derivation  when  we  choose  an  ordered  extension  to  expand  the  nonterminal  A'. 

31  Again,  we  assume  P  contains  at  least  one  branching  production.  If  not,  then  we 
should  only  loop  as  many  times  as  there  are  productions,  and  then  halt. 
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Theorem  8  RGPSG  Recognition  is  NP~hard. 

Proof.  We  reduce  3SAT  to  RGPSG  Recognition  in  polynomial  time.  Given 
a  3CNF  formula  /  of  length  m  using  the  n  variables  q\ . . .  qn ,  we  construct  an 
RGPSG  Gj  such  that  the  string  w  is  an  element  of  L(Gf)  iff  /  is  satisfiable, 
where  w  is  the  string  of  formula  literals  in  /.  Gj  is  constructed  as  follows: 

1.  Gj  includes  the  set  Atom  of  atomic  feature  {STAGE,  LITERAL,  gj, . . . ,gn} 
with  values  defined  by  the  function  p: 

p(STAGE)  =  {1, . . .  ,n  +  3} 

p(LITERAL)  =  {+,-} 

p(g.)  =  {0,1} 

The  set  of  h^-d  features  HEAD  is  {?i, ?2> •  •  • , Qn}-  The  grammar  will 

assign  truth  values  to  the  variables  and  check  satisfaction  in  n  +  3 
stages  as  synchronized  by  the  feature  STAGE.  The  start  category  is 
{[STAGE  1]}. 

2.  At  each  of  the  first  n  stages,  a  value  is  chosen  for  one  variable;  because 
the  qi  are  head  features,  the  values  that  are  chosen  will  be  maintained 
throughout  the  derivation  tree  by  the  HFC.  The  following  2n  non- 
branching  rules  are  needed,  constructed  for  all  t,  1  <  t  <  n.  All 
daughters  are  heads. 

{  [STAGE  t] ,  tq{  0]  }  —  {  [STAGE  i  +  1]  ,  [9,  0]  }  : 

{  [STAGE  ij ,  [g<  1]  }  -»  {  [STAGE  z  +  1] ,  [g,-  1]  }  : 

3.  At  stage  n  +  1,  the  grammar  has  guessed  truth  assignments  for  all 
variables;  all  that  remains  is  to  use  the  truth  assignments  to  generate 
satisfied  three-literal  clauses.  The  following  two  rules  generate  enough 
clauses  to  match  the  number  of  clauses  in  w: 

{  [STAGE  n  +  1]  }  —  {  [STAGE  rz  +  2]  }  : 

{  [STAGE  n  +  1]  }  -»  {  [STAGE  n  +  1]  }  {  [STAGE  n  +  2]  }  : 

4.  At  stage  n  +  2,  the  grammar  generates  satisfied  three-literal  clauses — 
clauses  containing  at  least  one  true  literal.  Let  Co  and  Ci  be  the 
following  categories: 

Co  =  {[STAGE  n  +  3],  [LITERAL  -]} 

Ci  =  {[STAGE  n  -j-  3],  [LITERAL  ♦]} 
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Then  the  following  7  ternary-branching  rules  are  needed;  any  set  of 
three  literals  makes  the  clause  true,  provided  at  least  one  literal  is 
true: 

{  [STAGE  n  +  2]  }  —  C0  C0  Cx  : 

{  [STAGE  n  +  2]  }  -  C0  Cl  C0  : 

{  [STAGE  n  +  2]  }  ->  Ci  C0  C0  : 

{  [STAGE  n  +  23  }  —  C0  Cx  Cx  : 

{  [STAGE  n  +  2]  }  —  Ci  C0  Ci  : 

{  [STAGE  n  +  23  }  -►  Cx  Cx  C0  : 

{  [STAGE  n  +  23  }  —  Cx  Cx  Cx  : 

5.  Finally,  lexical  insertion  at  stage  n  +  3  ties  together  the  truth-values 
chosen  for  the  variables  and  the  literals.  For  every  gt,  1  <  i  <  n,  we 
need  the  following  four  lexical  entries,  bringing  us  to  a  total  of  6n  +  9 
rules: 

(?<,{  [STAGE  n  +  33  ,  [LITERAL  +3  ,  [?,  1]}) 

(ft,  {[STAGE  n  + 33,  [LITERAL  -3,  [ft  03}) 

(ft,  {[STAGE  n  + 33,  [LITERAL  +3  ,  [ft  03}) 

(ft,  {  [STAGE  n  +  33  ,  [LITERAL  -3  ,  [ft  1]  }) 

There  are  no  category-valued  features,  LP  statements,  metarules,  or  simple 
deiaults  in  the  RGPSG  Gj  constructed  by  the  reduction. 

If  some  extension  of  the  start  category  5  =  {[STAGE  13}  can  be  gener¬ 
ated,  then  the  formula  /  is  satisfiable;  each  extension  of  the  start  category 
that  generates  a  string  must  encode  a  satisfying  truth  assignment.  For  ex¬ 
ample,  the  category 

{[STAGE  1  ],  [ft  13,  [92  03 . [ft*  13} 

generates  3- CNF  formulas  /  with  the  satisfying  truth  assignment  ft  = 
1)92  =  0,  ...,9„  =  1.  Note  that  the  RGPSG  constructed  in  the  reduc¬ 
tion  generates  all  satisfiable  3CNF  Boolean  formulas,  of  any  length,  using  n 
or  fewer  variables.  Q] 

B.2  Computation  Tree  Reduction  for  RGPSG 

Theorem  9  RGPSG  Recognition  is  NP-hard. 

Proof.  To  establish  this  result,  we  will  hide  a  polynomial  depth  pruned  OR 
computation  tree  in  an  polynomial  depth  branching  RGPSG  parse  tree.  The 
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major  obstacle  to  our  success  is  the  RGPSG  head  fpature  convention,  which 
ensures  that  all  heads  dominated  by  a  common  head  will  have  the  same 
head  features.  This  means  we  cannot  use  the  HFC  to  transfer  unaltered 
tape  squares  from  a  configuration  to  its  successors  as  we  did  for  GPSG 
recognition. 

The  trick  is  that  each  RGPSG  category  will  encode  a  polynomial  depth 
pruned  OR  computation  tree!  Assume,  without  loss  of  generality,  that  our 
target  computation  tree  has  polynomial  depth  d  and  uses  space  s.  We 
create  s  atomic  features  g\ . .  .g,  to  encode  the  tape  squares,  as  before,  and 
d  category- valued  head  features  f\ . . .  fd  to  represent  a  path  in  the  unpruned 
OR  computation  tree,  which  is  merely  a  pruned  OR  tree  or  a  straight  line. 
We  will  need  the  atomic  features  state  and  head  to  represent  the  machine 
state  and  head  position,  respectively. 

We  will  use  the  ID  rules  to  transfer  the  tape  contents  encoded  in  the 
0-level  categories  from  one  embedded  category  to  another.  Thus,  in  the 
parse  tree,  embedded  categories  will  obey  the  next-move  relation.  The  head 
feature  convention  serves  no  useful  role  in  this  reduction-it  merely  ensures 
that  all  categories  in  the  resultant  parse  tree  are  identical. 

For  each  transition  t  found  in  6  we  will  create  s-d(l+ 2(s- l))-f  1  =  9(d-s2) 
ID  rules  as  follows.  Assume  wolg  that  t  changes  a  tape  square  from  a  to 
a'  and  state  from  q  to  q',  and  moves  the  tape  head  right.  For  each  possible 
tape  square  i  altered  by  t,  1  <  i  <  s,  and  each  fj  other  than  representing 
a  configuration  t  could  apply  to,  create  the  ID  rule 

{[fj  {[state  g],  [head  i] , [g,  <r]}]}  -*■ 

ilfj+i  {[state  g'],  [head  i  +  1],  [g,  :  l;i|’ 

and  the  2(s  -  1)  ID  rules  for  all  tape  squares  k,  k  /  t: 

{[/;  {[state  g] ,  [head  t] , [gk  1]}]}  — 

{[fj+i  {[state  q'l ,  [head  :  +  lj}j}  :  ljk\' 

{[fj  {[state  g],  [head  t] , [gk  0]}]}  — 

{[/j+i  {[state  g'3,  [head  t  +  l],[g*  0]}]}  :  ljk\' 

We  need  one  more  ID  rule  to  terminate  accepting  configurations,  which  are 
in  a  special  accept  state  qa: 

{[fd  {Cstate  gaI}]}  —  accept 
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The  distinguished  start  category  will  encode  the  root  node  of  the  compu¬ 
tation  tree  embedded  in  the  category- valued  feature  f\. 

Note  that  the  parse  tree  successfully  simulates  the  target  computation 
tree  iff  it  yields  a  string  containing  every  possible  pair  ji  for  1  <  j  <  d 
and  1  <  i  <  4.  The  lji\'  substring  in  the  terminal  string  indicates  that  any 
successful  derivation  of  the  terminal  string  has  transferred  the  contents  of 
the  ith  tape  square  from  the  jtH  configuration  to  it’s  successor  (the  j  +  Ith 
configuration)  in  accordance  with  the  next-move  relation.  These  terminal 
strings  look  like  ‘11|12|13| . . .  (21|22|23| . . . \ji\ . . .  | (d  -  l)s\accept\  Q 
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C  An  RGPSG  for  English 

Before  presenting  the  English  RGPSG  in  its  entirety,  we  discuss  some  of  the 
more  tricky  aspects  of  converting  the  FCRs  and  FSDs  of  the  GKPS  English 
grammar  frito  RGPSG. 

We  must  stop  +P0SS  from  getting  forced  on  everything,  perhaps  with 
an  SD  If  true  Then  [POSS  noBind].  What  does  +P0SS  mean  in  GKPS 
anyway?  Their  analysis  of  NPC+PQSS]  and  PPC+POSS]  must  be  clarified. 

We  duplicate  FSD  7:  [BAR  0]  D  -i  [VFORM  PAS],  which  prevents  ran¬ 
dom  lexical  categories  from  assuming  a  passive  alternate,  in  a  complicated 
and  obtuse  manner.  We  introduce  a  new  head  feature  ±PAS  to  indicate 
passive  sentences  and  verb  phrases,  and  no  longer  allow  the  VFORM  feature 
to  take  PAS  as  a  value.  We  also  include  four  SDs  to  ensure  that  VFORM  and 
PAS  are  mutually  exclusive,  and  that  PAS  only  appears  in  [+V.-N]  cate¬ 
gories.  While  this  solution  allows  us  to  avoid  both  the  implicit  disjunctive 
consequence  of  FSD  7  and  morass  of  problematic  feature  specifications,  it  is 
linguistically  dubious.  RGPSG  (incorrectly)  claims  that  human  languages 
can  have  passive  categories  that  are  also  finite,  infinitival,  and  so  on.  On 
the  other  hand,  this  is  just  as  offensive  as  the  GPSG/RGPSG  claim  that 
some  human  languages  can  specify  a  category  for  the  three  features  VFORM, 
NFORM,  and  PFORM  simultaneously.  The  proper  solution  to  this  problem  is  to 
introduce  finer  internal  structure  in  feature  specifications.  A  more  intricate 
version  of  the  tree-like  theory  of  features  proposed  in  Gazdar  and  Pullum 
(1982)  or  the  complex-symbol  rules  of  Chomsky  (1965)  appear  to  be  more 
linguistically  and  computationally  desirable  for  these  reasons. 

FCR  10:  [+IUV,BAR  2]  D  [+SUBJ]  prevents +INV  from  “dripping  through” 
the  VP  by  the  HFC.  Note  that  +INV  is  only  introduced  on  the  mother 
V2C+SUBJ]  category  of  lexical  ID  rules,  and  that  the  only  instance  of  V2C+SUBJ] 
as  the  head  daughter  of  a  (potential)  VP  is  in  the  ID  rule  35 

V2  -»  XI  :  X2C+ADV]  (35) 

We  include  the  simple  default  SD:  If  [-SUBJ]  Then  [-ISV]  to  prevent 
♦  MV  from  rising  through  any  daughter  VP's.  Alternately,  we  could  re¬ 
place  35  with  36,  which  could  be  linguistically  incorrect. 

V2C-IIV]  -  X2  :  A2C+ADV]  (36) 
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Lastly,  GKPS  fn.2  on  page  73  says  that  the  category  S[VF0RM  PAS]  is 
invalid  in  English,  yet  GKPS  fail  to  enforce  this  constraint  in  their  GPSG 
for  English.  If  it  is  actually  desirable  to  rule  out  the  suspect  category,  then 
include  SD:  If  [+SUBJ]  Then  [-PAS]. 

Without  further  ado,  the  following  RGPSG  is  the  result  of  translating 
the  GKPS  grammar  for  English  into  the  RGPSG  formal  system. 


SYNTACTIC  FEATURES 


CASE 

{acc.nom} 

COMP 

{for .that .whether , if .noBind} 

CONJ 

{and, both, but .neither .either, nor 

GER 

<+.-> 

NEG 

{+.-> 

NULL 

<+.-> 

POSS 

{RECP.REFL} 

REMOR 

{RECP.REFL} 

WHMOR 

{R.Q.FR.EX} 

;  HEAD 

features 

AGR 

{> 

ADV 

*+.-> 

AUX 

INV 

<+.-> 

LOC 

N 

{+.-} 

NFORM 

{there .it .NORM} 

PAS 

{+.-> 

PAST 

{♦.-> 

PER 

{1.2.3} 

PFORM 

{to ,  by  .for, about , of .with, . . . . } 

PLU 

{+.-} 

PRD 

{♦.-> 

V 

<♦.-> 

VFORM 

{BSE ,FII, INF.PRP ,PSP} 

;  BHEAD 

i  features 

BAR 

{0 , 1 ,2 .noBind} 

SUBCAT 

{l . 48 .for .that .whether , if , 
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and, both, either , neither ,but ,nor ,or ,  not} 
SUBJ  {+,-} 

;  FOOT  features 

RE  {} 

SLASH  {} 

WH  {} 


; - ABBR.EVIATIONS- 

S  : :=  [+V ,-N , +SUBJ ,BAR  2] 

VP  ::=  [+V.-H, -SUBJ, BAR  2] 

NP  ::=  [-V.+H.BAR  2] 

AP  ::=  C+V.+N.BAR  2] 

PP  ::=  [-V.-H.BAR  2] 

V  ::=  [+V.-H] 

N  ::=  [+»,-V] 

A  ::=  [+H.+V3 
P  ::=  [-H.-V] 

+it  CAGR  NP[KFORK  it]] 

+there  ::=  [AGR  HPCRFORM  there]] 
+N0RM  : : =  [AGR  HP [NFORM  HORM] ] 

+Q  : : =  [WH  HP [WHMOR  Q] ] 

+R  : :=  [WH  HP [WHMOR  R]] 

Deg  :  :=  •( [SUBCAT  23, BAR  noBind]} 
*F  ::=  [F  noBind] 


; - SIMPLE  DEFAULTS - 

If  [SUBCAT]  Then  [BAR  0] 

If  [SUBCAT]  Then  [SLASH  noBind] 

If  (* [+PAS]  ft  ' [PRP]  ft  [+V.-H]  Then  [PRD  noBind] 
If  [+SUBJ.VH]  Then  [COMP  noBind] 

If  [+SUBJ.IHF]  Then  [COMP  for] 

If  [+V.-H.-SUBJ]  Then  [AGR  HP [HFORM  HORM]] 

If  [SLASH]  Then  [WH  noBind] 

If  [WH]  Then  [SLASH  noBind] 

If  A1  Then  [WH  noBind] 

If  VP  then  [WH  noBind] 

If  true  Then  [HULL  noBind] 
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If  true  Than  [CONJ  noBind] 

If  [-SDBJ]  Then  [-INV] 

If  [VFORM]  Than  [-PAS] 

If  [+PAS]  Than  [VFORM  noBind] 

If  [SUBCAT]  ft  [+V.-N]  Than  [-PAS] 

If  [-V]  I  [+BT]  Than  [PAS  no  Bind] 

;  to  duplicate  FCR  5,  which  appear*  to  be  useless  .... 

If  [+SUBJ]  I  ('[FIN]  ft  [VFORM])  Than  [PAST  noBind] 

;  these  four  are  all  questionable 

If  true  Then  [-INV]  ;  means  extraposed  S’s  must  be  matrix  S’s 
If  true  Then  [CASE  NOM]  ; BUT  no  case  defaults  should  be  allowed 
If  [+N,-V,BAR  2]  Then  [CASE  ACC]  ;  no  case  defaults  should  be  allowed 
If  [+SUBJ]  Then  [-PAS]  ;  GKPS  p.73  fn2  implies  this  is  needed 

; - ID  ROLES - 


VP  ->  [1]  : 

dine  elapse  grow  look) 

VP  ->  [2]  :  NP 

abandon  enlighten  castigate  slap  eat  devour  grow  bring  trade) 


'/.(die  eat  sing  run  succeed  weep  occur 
'/.(sing  love  close  prove  succeed 


NP,  PP[to] 
NP,  PP[f or] 
NP,  NP 
NP,  PP [+LOC] 
X2 [+PRD] 


130  claims  this  should  be  VP[+AUX]  ->  [7]  :  X2[+PRD] 


NP,  S [FIN] 
(PP[to]),  S  [FIN] 

:  S  [BSE] 

:  (PP[of])>  S  [BSE] 


VP  ->  [3] 

VP  ->  [4] 

VP  ->  [S] 

VP  ->  [6] 

VP  ->  [7] 

;  Sells  p. 

VP  ->  [8] 

VP  ->  [9] 

VP  ->  [10] 

VP  ->  [11] 

VP [INF , +AUX , AGR  NP]  ->  [12]  : 

VP  ->  [13]  :  VP [INF] 

VP  ->  [14]  :  V2[INF,+N0RM] 

VP  ->  [16]  :  VP[INF,+N0RM] 

VP  ->  [10]  :  (PP[to]),  VP  [INF] 

VP  ->  [17]  :  NP,  VP [INF] 

VP  ->  [18]  :  NP,  VP[INF,+NORM] 

VP  ->  [19]  :  (NP),  VP [INF , +N0RM] 

VP [AGR  S]  ->  [20]  :  NP 
VP [+it]  ->  [21]  :  (PP[to]),  S [FIN] 
VP [AGR  NP [there, PLU.l]]  ->  [22]  : 
VP  ->  [40]  :  S  [FIN] 


‘/.(give  sing  throw  hand  trade) 
'/.(buy  cook  reserve  save  trade) 
'/.(spare  hand  give  buy  trade) 
'/.(put  place  stand) 

'/.(be) 


'/.(persuade  convince  tell) 
'/.(concede  admit) 

'/.(prefer  desire  insist) 
'/.(require) 

VP  [BSE]  ‘/.(to) 

'/.(continue  tend  seem  want) 
'/.(prefer  intend) 

'/.(try  attempt  want) 

'/.(seem  appear) 
'/.(believe  expect) 
'/.(persuade  force) 
'/.(promise) 

‘/.(bother  amuse) 

'/.(seem  appear) 

NP  [PLU_l]  '/.(be) 

'/.(believe  say  regret) 
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VP  ->  [43]  :  S[+Q] 

VP [+it]  ->  [44]  :  NP.  S[+R] 

VP [+it]  ->  [44]  :  X2,  S[FIN]/X2 
VP  ->  [45]  :  PP[of] 

VP  ->  [48],  X2[-SUBJ,C0NJ  and]  : 

VP  ->  12 [-SUBJ]  :  AP [+ADV] 

VP  ->  X2[-S0BJ]  :  X2 [+ADV] 

S  ->  X2[+SUBJ]  :  12 [+ADV] 

VO [+HEG]  ->  XO [+AUX]  :  [SOBCAT  not, 'BAR] 


7, (wonder  ask  inquire) 

7.  (be) 

7.(be) 

7.(approve) 

7. (come  go) 

7,  ;is  this  too  constrained? 

7,  ; adverbial  adjuncts  to  VP 

7.  ; adverbial  adjuncts  to  S 

7.  ;  "was  not" 


A2  ->  XI  :  (Deg) 

AP  [-ADV]  ->  II  :  ( AP [+ADV] ) 

A1  ->  [24]  :  PP [about] 

A1 [AGR  S]  ->  [26]  :  PP[to] 

A1  ->  [26]  :  S  [FIN] 

A1  ->  [27]  :  S  [BSE] 

Ai  ->  [28]  :  VP  [INF] 

A1  ->  [29]  :  V2[INF,+N0RM] 

Al  ->  [42]  :  V2  [INF]  /NP  [-N0M] 


7. 

7. 

7. (angry  glad  curious) 

7,(apparent  obvious  certain) 

7. (afraid  aware  amazed) 
7,(insistent  adamant  determined) 
7,(likely  certain  sure) 

7.(anxious  eager) 

7.(easy) 


Spec  : : =  determiners,  possessive  phrases,  limited  set  of  quantifying 
APs  (e.g.  many,  few) 

N1  ->  XI  :  Modifier  by  Andrews(1983) 


NP  ->  XI 
NP [+SOBJ] 


NP  [+P0SS] 
N1  ->  XI  : 
N1  ->  XI  : 
HI  ->  XI  : 
HI  ->  XI  : 
HI  ->  [30] 
HI  ->  [31] 
HI  ->  [32] 
HI  ->  [33] 
HI  ->  [34] 
HI  ->  [36] 
HI  ->  [36] 
HI  ->  [37] 


Spec  7. 

->  XI  :  NP[+P0SS]  7.  ;  XI  prevents  multi-DET  or  -gerunds 

;  [+SUBJ]  says  i  have  a  subject,  allowing  refl/recip 
;  to  work  properly.  Multiple-possesives  must  be  allowed. 

->  XI  :  '"s"  7. 


PP[+P0SS] 

PP 

S[+R] 

HO 


PP[with],  PP [about] 
S [COMP  that] 

S [BSE, COMP  that] 

V2 [IHF] 

PP[of] 

PP[of],  PP[to] 

PP [of , +GER] 


;for  noun-noun  modification 

7, (death  disappearance  laughter) 

7,(argument  consultation  conversation) 
7,(belief  implication  proof  notion  idea) 
7,(request  insistence  proposal) 

7,(plan  wish  desire) 

7.(king  sister  inside  love  seduction  criticism) 
7,(gift  announcement  surrender) 

7.(dislike  admission  memory  habit  prospect  idea) 


depending  on  how  LP  statements  are  enforced,  H-H  mod  rule  might  have  to  be 
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;  N1  ->  XI  :  NO [SUBCAT  {30 ,31 . 32 ,33 . 34 , 35 . 36 , 37}] ,  i.e.  heinous. 
;  because  LP  says  SUBCAT  always  comes  first. 


; Mod  ::=  almost,  totally,  immediately,  right,  three  feet,  nearly 

;Mod  bears  the  SUBCAT  feature  and  therefore  precedes  XI  in  the  first  PP  rule. 

;  maybe  P1[+P0SS]  should  be  PPC+POSS]  or  the  grammar  crashes 


PP  ->  XI  :  Mod 
PI  ->  [38]  :  NP 
PI  ->  [39]  :  PP[of] 

P1[+P0SS]  ->  [41]  :  NP [+P0SS] 


*/. 

'/.(to  underneath  in  beside) 

‘/.(out  forward  in.front  in.back) 
'/.(of) 


S  [COMP  noBind]  ->  X2[-SUBJ,AGR  X2]  :  X2  '/. 

S  [COMP  noBind]  ->  X2[+SUBJ]/X2  :  X2  */. 

S [COMP  that. FIN]  ->  S[C0MP  noBind]  :  [SUBCAT  that]  */. 

S  [COMP  that, BSE]  ->  S  [COMP  noBind]  :  [SUBCAT  that]  •/. 

S  [COMP  whether]  ->  S[C0MP  noBind]  :  [SUBCAT  whether]  '/. 

S  [COMP  if]  ->  S  [COMP  noBind]  :  [SUBCAT  if]  '/. 

S  [COMP  for,  INF]  ->  S[C0MP  noBind]  :  [SUBCAT  for]  '/. 


;  iterating  coordination  schema,  *. means  positive  transitive  closure  + 


X  ->  [CONJ  and]  .  X*  :  •/. 

X  ->  [CONJ  noBind]  ,  [CONJ  and]  *  :  */. 

X  ->  [CONJ  neither]  ,  [CONJ  nor]  *  :  '/. 

X  ->  [CONJ  or]  ,  I*  :  ’/. 

X  ->  X,  [CONJ  or]*  :  ’/. 

;  binary  coordination  schema 
X  ->  [CONJ  both]  ,  [CONJ  and]  :  ’/. 

X  ->  [CONJ  either]  ,  [CONJ  or]  :  '/. 

X  ->  X,  [CONJ  but]  :  */. 


; - LP  STATEMENTS - 

[SUBCAT]  <<  [SUBCAT  {unbound .noBind}] 

[+H]  «  P2  «  V2 

[CONJ  {both, either, neither .noBind}]  <<  [CONJ  {and, but , nor , or}] 


METARULES 
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;  passive  metarule 

! VP  ->  W,  NP!  =>  ! VP C+PAS]  ->  W,  ( PP  [by] > ! 

;  subject-aux  inversion  metarule 

! V2[-SUBJ]  ->  V!  =>  !V2[+INV,+AUX,+SUBJ,FIN]  ->  W,  NP! 

;  extraposition  metarule 

!X2[AGR  S]  ->  V!  =>  !X2[+it]  ->  W,  S! 

;complement  omiss-'on  metarule 
! C+N .BAR  1]  ->  W!  =>  ![+N,BAR  1]  ->  ! 

;  slash  termination  metarule  1 
! X  ->  W,  X2!  =>  ! X  ->  W,  X2 [+NULL] ! 

;  slash  termination  metarule  2 

! X  ->  W,  V2 [+SUBJ ,FIN] !  =>  !X/NP  ->  V,  V2[-SUBJ] ! 

; - SOME  LEXICAL  ROLES - 

;  universal  lexical  rule, 

;  where  two  X2’s  are  linked  lor  all  features  but  NULL 
X2 [+NULL]_1/X2_1> 

<"quickly" ,  A[+ADV]> 

<"excessively" ,  A[+ADV]> 

<"aren’t" ,  V [+AUX , +NEG .PER  1  ,-PLU,+INV]> 

<"that" ,  CSUBCAT  that, 'BAR] > 

<"whether" ,  [SUBCAT  whether , 'BAR] > 

<"if " ,  [SUBCAT  if, 'BAR] > 

<"for" ,  [SUBCAT  for, 'BAR] > 

<"both" ,  [SUBCAT  both, 'BAR] > 

<"either" ,  [SUBCAT  either , 'BAR] > 

<"neither" ,  [SUBCAT  neither , 'BAR] 

<"and" ,  [SUBCAT  and, 'BAR] > 

<"but" ,  [SUBCAT  but, 'BAR] > 

<"nor" ,  [SUBCAT  nor, 'BAR] > 

<"or"  ,  [SUBCAT  or, 'BAR] > 

<"it" ,  N P [PRO , -PLU , NFORM  it]> 

<"there" ,  IP  [PRO,  NFORM  there] > 
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-.lexical  rules  to  discharge  PFORM 
<"of”,  PO [PFORM  of]> 

<"to",  PO [PFORM  to] > 

<"with" ,  PO [PFORM  with]> 

<" about”,  PO [PFORM  about] > 

<"by" ,  PO [PFORM  by]> 

<"ior” ,  PO [PFORM  for]> 

C'what" ,  NP [+Q] > 

<”which",  NP  [+Q] > 

<"which" ,  NP  [+R] > 

<"which",  Det[+Q]> 

<"which",  Det [+R] > 

<"whose",  Det[+POSS,+Q]> 

<"whose" ,  Det [+POSS ,+R] > 

<"so”,  Deg> 

<"too" ,  Deg> 

<”very",  Deg> 


D  Syntactic  features  in  GPSG  and  RGPSG 

These  notes  contain  an  informal  description  of  the  GPSG/RGPSG  feature 
system:  what  each  feature  means,  and  how  it  is  used.  A  typical  entry  is  of 
the  form: 

#<feature>  {<permissible-f eatur e-values >} 

[<f eature-specif ication-l>] :  what  <f eature-specif ication-l>  means. 
[<feature-specilication-2>] :  what  <f eature-specif ication-2>  means. 


#BAR  {0,l,2,noBind} 

[BAR  0]:  for  pure  lexical  entries,  i.e.  words. 

(nouns,  verbs,  prepositions,  adjectives,  adverbs).  Almost 
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all  preterminals  will  have  the  [BAR  0]  feature. 

[BAR  1] :  for  lexical  entries  and  their  complements 

(e.g.  for  ‘[mam  [that  I  knew]]’  because  it  is  missing  its  specifier) 

[BAR  2]:  for  complete  phrases  (NP,  TP,  PP,  AdjP),  including  both 
complements  and  specifiers.  The  only  [BAR  2]  lexical  categories  are 
for  "there"  and  "it". 

#CASE  {NOM, OBL, OBJ}  ;  this  feature  is  in  BREAD . 

[CASE  NOM] :  nominative  case  is  assigned  by  a  VP  to  its  subject 
"I",  "who",  "he"  are  all  NP[CASE  NOM]. 

[CASE  OBJ] :  objective  case  is  assigned  by  a  verb  to  its  NP  complements 
“him",  "me"  are  NP[CASE  OBJ, OBL].  The  GPSG  specification  [CASE  ACC] 
(accusative  case)  includes  both  the  objective  amd  oblique  cases  of 
RGPSG. 

[CASE  OBL] :  oblique  case  is  assigned  by  a  preposition  to  its  NP . 

"whom"  is  NP[CASE  OBL],  although  this  is  a  weak  distinction. 

#COffP  { tor ,  that ,  whether ,  if  .noBind} 

This  feature  labels  a  clause  with  the  lexical  item  appearing  in  its 
'complementizer  position.  The  value  of  COMP  is  morphologically  realized. 
For  example,  "whether  John  is  a  fool"  is  S[C0MP  whether]- 

#C0NJ  {and .both , but .neither .either ,nor , or ,noBind> 

This  feature  labels  a  conjunct  with  the  conjunction  word  that  is 
associated  with  it.  For  example,  in  the  coordinate  structure  "Either 
Bill  or  Bob  died",  the  first  conjunct  would  be  NP[C0NJ  either],  and 
the  second  conjunct  would  be  NP[C0NJ  or]. 

ftGRNDER  {M,F,I>  "gender" 

[GENDER  M] :  for  masculine  gender 
[GENDER  F] :  for  feminine  gender 
[GENDER  N] :  for  neuter  gender 

*GER  {+,-}  "gerund" 

[GER  +] :  for  gerunds  (verbs  functioning  as  nouns  with  an  "-ing"  ending) 
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[GER  -] :  f or  nouns  that  may  not  ba  gerunds . 

#NEG  {+,-}  "negation" 

[NEG  +] :  tor  all  negated  categories  ("not")  and  categories  in  the 
scope  of  negation.  Contractions  (“wasn’t")  should  be  separated  into 
"was  not",  and  "not"  should  be  replaced  by  the  [KEG  +]  preterminal. 

[NEG  -] :  for  elements  that  may  not  be  negated.  No  lexical  entry  is 
[NEG  -]  . 

#NULL  {+,-} 

[NULL  +] :  for  phonologieally  empty  elements,  such  as  traces  and  gaps. 
[NULL  -] :  for  phonologieally  overt  elements.  All  lexical  entries 
are  [NULL  -] . 

#PER  {1 ,2,3}  "person" 

[PER  1]:  for  first  person  nouns  "I",  "me" 

[PER  2] :  for  second  person  nouns  "you" 

[PER  3]:  for  third  person  nouns  "he",  "she",  "it" 

#P0SS  {+,-} 

[POSS  +] :  for  possessives,  i.e.  nouns  with  genetive  case  such  as  "his". 

#SUBCAT  {1. . . 48 .for .that .whether , if , 

and, both, either .neither .but , nor , or .not >  "subcategorization" 

All  words  have  at  least  one  value  for  their  SUBCAT  feature.  In  the 
feature  specification  [SUBCAT  n] ,  n  is  a  subcategorization  index. 

[SUBCAT  1]:  intransitive  verbs  (die,  eat,  sing,  run) 

[SUBCAT  2]:  verbs  appearing  in  the  VP  [V  HP]  (e.g.,  SING  a  song) 

[SUBCAT  3] :  verbs  appearing  in  the  VP  [V  HP  to  NP]  (GIVE  John  a  book) 

[SUBCAT  4] :  [V  VP  for  VP]  verbs  (BUT  a  book  for  John) 

[SUBCAT  6] :  [V  VP  NP]  verbs  (GIVE  Bill  the  book) 

[SUBCAT  6] :  [V  VP  PP[+L0C]]  verbs  (PUT  the  disk  in  the  water) 

The  meaning  of  numerical  SUBCAT  values  is  determined  by  the  IS  rules 
they  appear  in;  see  the  GKPS  or  RGPSG  grammars  for  English.  The 
nonnumerical  SUBCAT  values  (for  example,  "for"  or  "that")  represent 
themselves:  the  terminal  "for"  is  associated  with  the  [SUBCAT  for] 
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feature  specification,  "that"  is  associated  with  [SUBCAT  that] ,  and  so 
on.  Mote  that  there  are  (at  least)  two  lexical  entries  for  "for": 

<"for",  [SUBCAT  for, BAR  noBind]> 

<"for" ,  [-N.-V.BAR  O.PFORM  for]> 

#SUBJ  {+,-}  "subject" 

[SUBJ  +] :  for  categories  with  subjects  (clauses,  for  example). 

[SUBJ  -] :  for  categories  missing  a  subject  (verb  phrases,  for  example). 

#REMOR  {RECP.REFL}  "REciprocal/REflexive  morphology" 

[REMOR  RECP] :  for  reciprocals  ("themselves"). 

[RENOR  REFL] :  for  reflexives  ("each  other"). 

#WHMOR  {R,Q,FR,EX>  "VH-  morphology" 

For  wh-  nouns,  pronouns,  and  phrases,  e.g.  what,  who,  whom,  where. 

The  entries  for  these  words  are  highly  ideosyncratic :  see  the  lexicon 
fragment  in  the  RGPSG  for  English  given  below. 

[WHMOR  R] :  for  relative  wh-  pronouns  (whom,  which,  whose)  in  relative  clauses. 
[WHMOR  Q] :  for  interrogative  pronouns  (what,  which,  whose)  in  questions. 

[WHMOR  FR] :  ? 

[WHMOR  EX] :  ? 

#AGR  {}  ;  {}  indicates  category-valued  feature 

AGR  appears  only  on  verbs  in  English.  Its  value  is  the  type  of  subject 
that  the  verb  selects,  "frighten"  can  only  take  -ABSTRACT  noun  phrase 
subjects,  so  its  lexical  entry  might  be 
<"frighten" ,  TO [SUBCAT  2,  AGR  IP [+PLU,- ABSTRACT] ] > 

#ADV  {+,-}  "adverbial" 

[ADT  +] :  for  adverbial  adjectives  (i.e.  adverbs). 

[ADT  -] :  for  nonadverbial  adjectives. 

#AUX  {+,-}  "auxiliary" 

[AUX  ♦] :  for  auxiliary  verbs  ("is"). 

[AUX  -] :  for  verbs  that  are  not  auxiliaries. 

#IHT  {+,-}  "invertable" 
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[INV  +] :  for  invertable  verbs,  typically  the  finite  auxiliaries.  Items 
that  must  invert  (first-singular  "aren't",  "need")  are  labeled  [INV  +] . 
Examples:  Aren’t  you  cold?  Need  ve  die? 

[INV  unBound] :  for  verbs  that  optionally  invert  (some  finite  auxiliaries) . 
[INV  -] :  for  everything  else,  e.g.  for  the  auxiliary  "better"  because 
it  can’t  invert. 

#L0C  {+,-}  "locative" 

[LOC  +]  :  for  all  categories  that  are  locations  (water,  Egypt,  in  the  house). 
[LOC  -] :  for  categories  that  aren’t  locations  (John,  virtue). 

#N  {+,-}  "nominal" 

[N  +] :  for  nouns  and  adjectives/adverbs. 

[N  -] :  for  verbs  and  prepositions. 

#NF0RM  {there . it , NORM} 

[NFORM  there] :  only  for  the  pleonastic  noun  phrase  "there" 

<"there" ,  NP [PRO, NFORM  there] > 

[NFORM  it]  :  only  for  the  pleonastic  noun  phrase  "it" 

<"it"  ,  NP [PRO, -PLU, NFORM  it] > 

[NFORM  NORM]:  for  all  other  nouns.  Note  that  "it"  may  appear 
in  a  non-pleonastic  reading  as  a  pronoun. 

#PAS  {+,-}  "passive" 

[PAS  +] :  for  verbs  with  passive  morphology  (given,  kissed,  known,  believed). 
[PAS  -]  :  for  nonpassive  verbs. 

#PAST  {+,-}  "past  tense" 

[PAST  +] :  for  past  tense  verbs  (gave,  knew) . 

[PAST  -] :  for  present  tense  verbs  (give,  know). 

#PF0RM  {to, by, for, about, of .with, .  .  .} 

All  prepositions  will  have  a  PFORM  specification  whose  value  is 
the  preposition  itself,  e.g.  <"about",  PO [PFORM  about) >. 
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#PLO  {+,-}  "plural" 

[PLU  +] :  for  plural  categories  (e.g.  plural  nouns  and  verbs). 

[PLO  :  for  singular  categories. 

#PRD  {+,-}  "predicate" 

[PRC  +] :  for  predicates.  This  includes  adjectives  ("happy")  and 
predicate  nominals  (anything  that  denotes  a  set  of  things,  e.g. 

"doctor"  because  it  denotes  the  set  of  all  doctors). 

[PRD  -] :  for  words  that  cannot  be  predicates  (e.g.  determiners). 

#V  {+,-}  "verbal" 

[V  +] :  for  verbs  and  adjectives/adverbs. 

[V  -] :  for  nouns  and  prepositions. 

#VF0RM  {BSE, FIN, INF ,PRP .PSP} 

This  feature  labels  a  verb  and  its  projections  (VO,  Vi,  VP,  and  S) 
with  the  morphological  class  of  the  verb.  In  6PSG,  [VFORM  PAS]  labels 
passive  verbs  and  their  projections.  In  RGPSG,  [PAS  +]  performs  that 
function. 

[VFORM  BSE] :  uninflected,  untensed  base  verb  form 
[VFORM  FIN] :  inflected,  tensed  finite  verb  form 

[VFORM  INF]:  infinitival  verb  form.  Signifies  inflection  w/o  AGR  plus  VO, 
e.g.  "to  be"  or  "to  VO". 

[VFORM  PRP] :  purposive  verb  form,  e.g.  "cleaning  a  bone",  (not  implemented) 
[VFORM  PSP]:  past  participle,  (not  implemented) 
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